In This Section

4C Partners

Deutsche National Bibliothek
Keep Solutions
National Library Estonia
The Royal Library
Statens Arkiver
UK Data Archive
University of Glasgow

'The Future of Curation Costs' by Heiko Tjalsma

“Of course when the target of a particular activity is the long term, and you’re working in a world that has difficulty looking beyond next year’s financial results and/or budget, it’s always going to be a financial stretch” said Paul Wheatley on digital preservation costing in his earlier blog.

This reminded me of my first encounter with the topic “cost models for digital curation”. A number of years ago I was in a workshop in Paris which was in fact more on digital publishing than anything else. A discussion arose between two different schools: the one school defended the idea that long-term digital archiving would be very expensive in the short run, but once the start had taken place it would become cheaper all the time. The other argued exactly the opposite way: it was not particularly expensive at the start, but it would gradually, and “beyond next year’s financial results”, become more and more expensive.   

Were they in Paris talking about setting up a repository as well as expanding services or about an ever growing number of digital objects, databases etc. in an already existing repository? I do not remember the exact arguments anymore, but it gives me an opportunity to emphasise the point I would like to make here. That is, when talking about the costing of digital curation, whether for this year or for a much longer period, you have to be precise. Digital curation is a field of many misunderstandings, as even the word itself can be understood differently (see the comments on the blog by Ulla Bøgvad Kejser). This tends to happen in particular when talking about costs, for example when comparing the costs of digital preservation versus those of analogous, paper preservation.

Digital repositories have recently become more worried about the latter point: the rising numbers. They seem to relate this directly to the rising costs of storage. Until now these growing storage costs were seen as more or less negligible. Personnel costs are so much higher that storage costs. A typical quote from 2003 by Jim Gray (head of Microsoft's Bay Area Research Center) says it: “The  cost of backup/restore, archive, reorganize, growth, and capacity management seems to dwarf the cost of the iron.”

After long years of only talking about and raising awareness on digital preservation it seems that things are getting more serious, more to business. I mean repositories now get filled up effectively with rising quantities of digital material, leading to more nervousness on costs. There is increasing talk of “business plans” by heritage institutions as I notice anyway. These business plans need solid foundation and therefore you need insight in all your costs.

But are the present cost models solid enough to build your business plans on at the moment? Are they precise enough, is it possible to link them easily with the working processes in a repository, and in particular with the financial, book-keeping, systems of the different institutions exploiting digital repositories? 

There are still big hurdles here, to my mind. I hope and expect that the 4C project can be of use here, specially its Curious Costs Curation Exchange Machine. I guess some of the present, rather different, cost models will remain with us for some time to come. It would already be a great step forward if we would know what these cost models exactly do, by using standardized definitions.

It would make business plans make more transparent. It would make it easier for repositories to specify their costs towards their funders and make well-founded decisions on which costs could be earned back.

It would also help in other areas which, I think, will soon become prominent. One of them is mentioned by Raivo Ruusalepp in his blog last week: trying to “invest to save”. Another is to pay more attention to the appraisal and selection of data. Which parts of the data have to be kept really for the long term: what is their value for scientific research or cultural heritage?. Not in all circumstances will costs be the decisive element as there will be legal or contractual obligations as well, but it I am sure that this issue will get more attention. And it will always get linked to the costs.

Heiko Tjalsma is a senior policy advisor of DANS - Data Archiving and Networked Services, an institute of the Royal Netherlands Academy for Arts and Sciences (KNAW) and the Netherlands Organisation for Scientific Research (NWO).

Heiko is involved in the 4C project, in particular in the task of building the Costs Curation Exchange Machine CCEx.