In This Section

4C Partners

dpc
Jisc
Inesc
SBA
Dans
DCC
Deutsche National Bibliothek
Keep Solutions
National Library Estonia
The Royal Library
Statens Arkiver
UK Data Archive
University of Glasgow

'ANADP II Action Session - 4C Case Studies and Quantitative Data Session' by Neil Grindley and Raivo Ruusalepp

Digital preservation is the journey

Raivo Ruusalepp

Thomas Bayrle (1976), NDR (detail)

I came to a halt in front of this drawing when I was walking around the Museu D’Art Contemporani De Barcelona following a busy few days at the ANADP II conference. It stopped me in my tracks and made me want to photograph it for two reasons. Firstly, it was well drawn and conceived; and secondly it reminded me of something that my colleague Raivo had said during the conference.

‘Digital preservation is the journey’ he said and this piece of art magically represented that fresh little aperçu. I imagined these tiny cars and trains whizzing along with no destination in sight as the equivalent of archival information packages (AIP’s) migrated and replicated over networks, the drawn corollary of digital preservation formulated as a non-strategic, directionally disinterested, timeless, inexorable and indefinitely persistent activity.

But let’s take a few steps back and talk about the Action Session at the ANADP conference that first provoked Raivo’s remark. Our objective during the session was to do a short introduction to the 4C Project (a Collaboration to Clarify the Costs of [Digital] Curation) and then to work with the participants to find out how they might go about dividing up various curation activities into cost categories. An annotated version of the post-session summary presentation is on Slideshare which should give you a quick overview of how the session proceeded.

http://www.slideshare.net/neilgrindley/anadp-4-c-action-session-summary-annotated

What you’ll see is a series of exercises where we began by asking people to break down a journey that they were very familiar with into a series of way points or significant moments. And then asked them to try and split up that process/activity/journey into its component cost categories. The purpose of doing this was to get people thinking creatively and informally about the things they do and perhaps don’t usually think too hard about. The next step was to then ask them to think about the journey that a digital asset takes through time and the costs that are associated with that journey.

The results of the exercise were certainly interesting. But they weren’t easy to digest. And if I’m honest, I’m still not entirely sure what to do with them!

The reason we did the exercise is because the 4C Project has an objective to design and deliver what we are calling a Curation Costs Exchange (CCEx). This will be a platform that we hope will enable stakeholders from different sectors to share and compare their levels of investment into digital curation. We believe that this will help to highlight efficiencies, benchmark practice, and manage expectations about the costs of curation and ultimately, we hope, it will help organisations to refine their curation strategies and tactics. It might even help service and solutions providers to design curation tools that are more economically aligned with stakeholders’ capacity to pay for them.

So what has this got to do with the ANADP session?

In order to design the Curation Costs Exchange, we first have to examine and devise ways of making sure that some level of comparability is possible across the different ways that organisations and sectors undertake digital curation and how they cost it. And to do this, we have to understand how actual people in real jobs (rather than theoretical people in assumed roles) go about their work. This is why we decided to try and get people to talk in a relaxed way about how they really look after digital assets rather than remind them from the outset of things like the OAIS Reference Model (http://en.wikipedia.org/wiki/Open_Archival_Information_System); or the DCC Lifecycle Model (http://www.dcc.ac.uk/resources/curation-lifecycle-model); or any other aspirational framework.

What did the answers look like?

Even though there were only 15 sets of meaningful answers returned, I don’t have space in a blog post to really set it all out very descriptively. But suffice to say there was a whole range of approaches and a bewildering array of priorities set out. There were some things that almost everyone mentioned but what was more surprising (to me at any rate) was the casual way that many people just said that they did ‘curation’ or ‘preservation’. As if they are well understood activities that you just got on with and it was an array of other issues that were challenging and complicated and perhaps cost you money that you didn’t expect or didn’t want to afford.

Like what?

Ten out of the fifteen respondents mentioned training and skills issues either as an integral part of the sequence of looking after digital assets and/or a cost category. There was a noted focus on the provision of access to resources and what that cost. The promotion and marketing of curated assets was something that people said they needed to invest in; which also tied in with some emphasis on taking time to understand and anticipate user requirements. A minor but useful note to emerge was the investment required in maintaining an understanding of legal issues.

Putting all the written text together and dumping it into a text file produces a small corpus that represents the thoughts of the 15 participants in terms of the journey that digital assets take through their organisation; and the areas where investment is needed in order to sustain that journey and curate the asset. 

 

I think for the purposes of this blog post the Wordle™ above will be a good enough substitute for more sophisticated forms of text analysis. As you can see, the largest and most commonly stated words are the sort of things you would expect.

  • Infrastructure,
  • software,
  • hardware,
  • data,
  • preservation,
  • staff,
  • access, etc.

It is perhaps the lower order words that are the more interesting areas that presumably incur costs at levels which still may be significant for some organisations, e.g. maintenance, assessment, research, project, processing, etc.

The information that we harvested at the workshop will provide us with real data that we can turn into use cases for the Curation Costs Exchange and also workflows that we might try test mappings onto with the CCEx comparability framework.

In terms of the discussion we also had during the session, I will simply paste in bulleted text below that I presented in plenum after the session.

  • To what extent can digital curation adopt the approach where we benchmark against an ‘industry standard’ (e.g. insurance comparison sites)
  • Larger organisations don’t need a Curation Costs Exchange – the value of it will be for smaller organisations who need to understand cost structures for planning their budgets
  • OpenAIRE [an EC-funded project] is doing a costs/benefits analysis of running an institutional repository [this is something 4C needs to follow up on]
  • 4C should explain terminology and let people combine their cost structures themselves
  • We need a very concise way of bringing out the benefits of investing into digital curation. The costs should always be linked to the benefits. Focusing on the return on investment is vital
  • We need to acknowledge that calculating the cost of curation is a significant cost in itself
  • Data usage (web stats etc. any quantitative figures for data use are very important)

And amongst those other points and recommendations, I think the idea that came across most strongly to both Raivo and I was the insistence of the group that identifying and articulating the benefits of curation/preservation were at least as important (and possibly more important) than quantifying the costs.

And finally, back to the idea of ‘digital preservation is the journey’. I do quite like it because it fits with the idea that digital preservation is a ‘derived demand’ [i] which means that people are not interested in it for its own sake but are willing to pay for it because it enables and allows more tangible opportunities down the line, i.e. access to resources, use and re-use, etc. In the same way that presumably one usually takes a journey to get somewhere (rather than take a journey for its own sake) but there are useful measures that one can take along the way to make that journey more affordable, more efficient, safer, more pleasant etc.

But this does miss something ... something quite important. And which ties in with the group’s insistence on articulating the benefits. If we only focus on the journey and not the destination, then we risk missing the opportunity to insist that digital preservation and curation has a role to play in the area of organisational planning and strategy. Perhaps we can finesse this by saying ‘digital preservation is the journey, sustainability is the destination’.

Neil Grindley is the Coordinator for the 4C Project and when he isn't doing that he is a Programme Manager at Jisc, the UK's expert organisation on digital technologies for education and research.http://www.jisc.ac.uk

Raivo Ruusalepp, National Library of Estonia (NLE) leads WP4 ‘Enhancement.’ The NLE will test the Pilot Economic Sustainability Reference Model developed for the project and will act as a case study subject for trust, risks and viability case studies in WP4.



[i] This is terminology borrowed from the Blue Ribbon Task Force on Sustainable Digital Preservation and Access: http://brtf.sdsc.edu/