Friday, 13 February 2009

Repositories and the web

There has been an interesting series of posts on repository architectures, implementations and implcations, by Andy Powell on the eFoundations blog. He started wondering if multiple deposit diminished Googlejuice, and when I used an article of mine in comments, to illustrate that Google did cope better than expected (in some cases, at least, we were to discover), he had a look at the relevant DSpace-based repository and reported his horror at what he saw. A later post looking at an ePrints-based repository showed continuing concern, although slightly less so. Both posts attracted multiple comments, including an interesting extempore sketch of an OAI-ORE-based solution to one identified problem, by Herbert van der Sompel.

The discussion then kicked across into a closed JISC repository discussion list. It's not fair to report that conversation here, but I can at least quote from one of my comments:
"Part of Andy's original point (he's being saying this a lot), was that the repository platforms we use (DSpace, ePrints et al) are not good web-natives. He did a second post looking at [an] ePrints-based site [...], which showed some improvement over DSpace, but some related issues.

[...] Let's remember the list of problems that Andy spotted (in only a cursory examination; there may be more). He's at the DSpace splash page for my item:
  • a) PDF instead of (or not as well as) HTML for the deposited item
  • b) confusion over 4 different URIs
  • c) wrong Title, so Delicious etc won't work well
  • d) no embedded metadata in the HTML
  • e) no cross-linking of keywords (cf Flickr tags)
  • f) ditto author, publisher
  • g) unhelpful link text for PDF and additional material etc.
I think if those responsible for these repositories were building them today as web sites, they would not have constructed them with (all of) those deficiencies. (Some are, I think, matters of design choice, or at least in the vanguard of web thinking, eg (e).) But the repository platform brings (or is intended to bring) advantages of workflow, scalability, additional features such as OAI-PMH if it is actually useful, etc. It's supposed to save the repository manager the effort of building a purpose web site, and to provide some consistency across different repositories, as opposed to the general confusion of University web sites in general.

So one approach, as I suggested in an earlier email, is simply to invest to improve the repository platforms, so that they are better web natives, using up to date technologies, and so present a more appropriate web presence. Many of those issues could be solved with some software effort, benefiting everyone who uses DSpace (and similar effort could improve ePrints etc). Just add money and time.

But the multi-URI thing is a bit more of a problem; it sounds like the repository model we use is conceptually broken here. Herbert's solution sounds like a major design change for repository world. Move from depositing items in a FRBR sense (I know Andy has some issues with this) to depositing some thing much more akin to works, instantiated as Resource Maps. I think this is sufficiently different that there would be some serious debate about it. It certainly sounds like it would make sense, although I'm not sure how much good it will do in Googlejuice terms (Google dropped the OAI-PMH maps, so no huge reason to believe they will be interested in OAI-ORE Resource Maps per se, although this may not be the issue).

It would be very difficult for repository managers to craft those Resource Maps by hand in today's world (although clearly Oxford is doing SOMETHING with them). However, it does sound like the sort of thing that repository platforms could do fairly automatically for you. But again there would be significant development work involved. So in this case, we first have an effort to refine Herbert's sketch to a shared view across the repository world, followed by an effort to implement in major platforms. So add money twice, and more time.

Of course, Herbert's probably built it by now!

So my proposal to JISC would be:
  • invest in upgrading the common UK repository platform software for better web-nativeness (!)
  • invest in an effort along the lines of previous SWAP/CRIG approaches to get consensus in the Resource Map approach
  • invest in upgrading the common UK repository platform software to support this approach."
I'm pretty sure some of these issues are also part of what reduces the usefulness of these platforms for data sharing.

1 comment:

  1. OAI-ORE breaks googleJuice unless we serialise ORE as lists (e.g. foresite -> RSS yes but even more juicy if exposed as SiteMaps and browsable terms / breadcrumbs <- this is how Oxford makes its resources look like pacman pellets to google spiders (amongst a dozen other things @oxfordben uses: coins, unapi, etc).


Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.