Wednesday, 5 August 2009

Curated databases and data curation

I've been working on an article with a colleague, and came across something that's interesting, and that you might be able to help us with. There does appear to be a distinction between the way curation is used in the bio-sciences, and elsewhere. In particular, the term "curated database" tends to mean a manually constructed database that links literature to data, curated by experts who provide authority (eg see the Wikipedia definition of Biocurator). The earliest mention of the term "curated database" I can find is in the abstract (and only in the abstract) of Larsen et al (1993).

However, the terms "digital curation" and "data curation" tend to mean something different. We in the DCC say "Digital curation is maintaining and adding value to a trusted body of digital information for current and future use; specifically, we mean the active management and appraisal of data over the life-cycle of scholarly and scientific materials". This has a lot more elements of good management about it, and less of the idea of specific curators making judgements based on the literature. It has also been rather conflated with digital preservation.

The earliest reference to digital curation I can find is a report of an invitational meeting held in October 2001, oddly titled "The Digital Curation: digital archives, libraries and e-science seminar" (Beagrie and Pothen, 2001). In the meeting there was some discussion about data curation. The earliest more formal reference to data curation I can find is a technical report from Gray, Szalay et al (2002).

So my challenge is this: are there earlier references to digital (or data) curation, of the second kind?

Beagrie, N., & Pothen, P. (2001). The Digital Curation: digital archives, libraries and e-science seminar. Ariadne.

Gray, J., Szalay, A. S., Thakar, A. R., Stoughton, C., & vandenBerg, J. (2002). Online Scientific Data Curation, Publication, and Archiving. Redmond: Microsoft Research.

Larsen, N., Olsen, G. J., Maidak, B. L., McCaughey, M. J., Overbeek, R., Macke, T. J., et al. (1993). The ribosomal database project. Nucl. Acids Res., 21(13), 3021-3023.


  1. Try:

    Data Management: Managing Electronic Information: Data Curation in Museums Diane Zorich Museum Management and Curatorship (1995)

    Data and metadata collections for scientific applications High Performance Computing and Networking Lecture Notes in Computer Science vol 2110 pg 72

  2. Shawn, thanks. That is a really great reference, once I'd tracked it down. Zorich links a DoE report in 1994 on a 1993 meeting (The DoE Informatics
    Summit, Baltimore, MD, 26-27 April 1993, (U.S. Department of Energy,Washington DC), which I have not yet found. However, Zorich extends from DoE's curation metaphor, itself based on the bio-informatics example that I've mentioned, and applies it to completely different datasets (in this case museum "catalogues", but clearly more generic than that). She identifies characteristics of consistency, long term quality and relevance over time, and points out, among other things, that changes require verification and authentication. It is a really nice article.

    Zorich, D. (1995). Data Management: Managing Electronic Information: Data Curation in Museums. Museum Management and Curatorship, 14(4), 430-432.

  3. Looks like this is the DOE report...


Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.