Tuesday, 5 August 2008

Comments on Negative Click Research Repository System

You may remember that I wrote a series of posts about a Research Repository System, aiming to improve deposits by getting repositories to do more that’s useful to the researcher. I had suggested it should contain these elements:
  • web orientation
  • researcher identity management
  • authoring support
  • object disclosure control
  • data management support
  • persistent storage
  • full preservation archive, and
  • spinoffs
There has been some interesting commentary on this idea. Some of it was in the comments to this blog, but most came through the JISC Repository and Preservation Advisory Group’s use of Ideascale to discuss issues related to a possible revision of the Repositories Roadmap. I decided to post some of the ideas from the RRS posts to the Ideascale forum, and they attracted comments, both positive and negative, and votes, and I will try to summarise key points here. Note, I have only picked out posts and comments that I consider relevant to this approach; there were many other interesting ideas and comments that you might want to read if you care about repositories.

BTW if you haven’t tried it, I was very impressed by the combination of technologies that Ideascale have put together, and the simplicity and value of its approach, akin to brainstorming but slower! Worth giving a whirl…

Overall, the thrust of the whole conversation seemed to support my feeling that repositories must serve their depositors rather than down-stream users (the institution, the department, colleagues, the public). This links strongly to the theme behind the RRS set of ideas.

The most popular “idea” on the JISC Ideascale site was Andy McGregor’s, not mine, but was related to this theme: “Define repository as part of the user’s (author /researcher/ learner) workflow” (21 voted for, 2 against, net 19). While there was general support, there was some concern as well: ojd20 wrote “I agree with this idea in the sense of ‘take account of things users need to do’, but not in the sense that we can reduce the myriad functions of an HE institution to a small set of flow charts and design repositories to fit those… ‘Familiar processes and tools’ is a much better way of expressing it [than workflow]” The other warning, from Owen Stephens, was “I'm also wary of comments elsewhere from the likes of Peter Murray-Rust on this. He warns (my interpretation) that we need to be careful of doing this as if it complicates the workflow, it just won't happen”. A. Dunning echoed a common theme in saying “Disagree - the repository is a back end data provider that should not be part of the researchers' / learners' workflow - but the service which sits on top of the repository should be part of the workflow.”

Rachel Heery showed another common concern, about possible over-complexity:
“I think the RRS you envisage sounds fantastic and would be a 'good thing', what worries me is the 'function creep' taking us a few miles on from some of the more basic, simpler 'few keystrokes' approach to repositories. The RRS sounds to have many features of a Virtual Research Environment, albeit perhaps a less data centric VRE…”
This echoed a comment from Bill Roberts on one of the original blog posts:
“Seriously, though - the system you've sketched out is quite complicated, not so much in the individual parts of it, (though several of those parts are quite substantial) but in the design vision required to connect up all those features in a way that would be easy and natural to use. Can you identify a subset of parts that would make a useful version 1?”
And Steve Hitchcock also commented on complexity to the same post:
“The irony of the negative click philosophy is that it has led you to produce a visionary but significantly more complex system.”
The surprising (to me) winner among the RRS ideas was “Allow the user fine-grained disclosure/access control to repository objects”. This gained 12 positive and 2 negative votes, for a net 10 votes, and no negative comments. I don’t have a good explanation for why this was so popular, noting that the voters were mainly repository managers or “experts”, especially since some of the corollary ideas were much less popular.

Another idea posted by Rachel Heery, not from RRS but worth taking note of (and getting 11 votes, none against) was “We shouldn't be thinking of repositories as a place.” In particular this meant “'repositories' are best viewed as a 'type' of data store supporting a variety of services, embedded in various workflows.”

I’ll include the Researcher Identity Management idea next, not for its own votes (4 in favour, 2 against), nor for the comments (of which there were none), but because of the linkages I had made between researcher identity and Current Research Information Systems (CRIS). The idea associated with CRIS was from Ian Stuart, and got 13 positive and 4 negative votes: “Repositories are dead, long live repositories” (yes, ideas are not always best titled!). He wrote
“The current repository technology is library/cataloger centric: items are uploaded (usually by a cataloger, not the author), and most of the meta-data is added by a subject specialist. In this model, the author-as-depositor is (at best) just an initiator for a deposit process.

A better solution would be to move towards a Combined [sic] Research Information System [CRIS], where the academic can organise their areas of interest [AOI]; see the research grants they have (and associate them with their AOI); lodge keep-safe copies of work-in-progress, data-sets, talks, ideas for future work, posters, etc (and associate them with grants or AOIs).

From this corpus of data, the academic can indicate what is visible locally (within the research group/department/organisation) and what is available globally... and from that "globally available" pool, an "Institutional Repository" can be assembled.”
The idea that the repository should be more web-oriented got 6 votes. There were no negative comments on the idea itself, but I drew a comparison (echoing Andy Powell) with Slideshare, and Paul Walk reacted against this: “More like the Web - yes! Like Slideshare - no!” He expands on this comment later (see below).

The idea that the repository should be more Web 2.0-like drew 6 positive but 5 negative votes, net 1 vote, possibly the most divided idea of all! I guess it’s one I was much less sure of, but included it because of comments to my earlier posts on this theme. A. Dunning wrote: “It's not so much the repository itself that needs Web2.0 but the range of the services which exploit that data.” And Paul Walk wrote:
“The repository should be able to participate in an interactive Web. It should be entirely possible for someone else to build a remote Web 2.0 service around resource exposed in my repository. This does not preclude me building such a service - if I think I have sufficient mass of interested users etc. but if I *need* to do this because only I can, then I have probably just built another silo.

This is what I meant by my assertion that, in general, Technorati offers us a better model than Slideshare. Someone can build a better, more focussed, domain specific etc. version of Technorati if there is demand for this, without needing to move or copy the resources in their 'source repositories' (blogs, in this case).”
[Paul, I have to say I’m unconvinced so far. A repository has to have the stuff, so that makes it more like Slideshare. And for various reasons, Technorati sucks (searches don’t work properly, lots of things fall off or go wrong). If you’d said Google instead, I might have got it. But searching by reference rather than by inclusion seems like going back to Z39.50 rather than search engine harvesting approaches (or even OAI-PMH). But maybe I’m resolutely getting the wrong end of this stick!]

Richard Davis wanted a foot in both camps:
“IMO, ideally a repo would be susceptible to all sorts of Flickrish RSS, Web API type manipulation - SWORD-like and who knows what else - leading to total personalisation potential of the user experience.

OTOH, providing implementers and users with an acceptable out-of-the-box UI, that might include basic implementations of some voguish features, and widgety ways to configure them - cf. Flickr or Wordpress - is no bad thing either.”
The idea of providing persistent storage got 6 positive and 2 negative votes, net 4 votes. Paul Walk was generally happy:
“I agree that persistence, more or less as you define it here, should be a core property of what I think of as a 'source repository'. In terms of ease of use: read, or 'get' access should be simple and should be provided through HTTP unless there is a very good reason not to. We should aspire to ease of use in terms of ingest and administrative activities.

In terms of synchronisation with offline storage (unless I have misunderstood your point): this is an interesting area, but I'm not sure it is a fundamental or universal function of 'the repository'. I think we are going to see a strange race between the development of mechanisms to do this (e.g. GoogleGears) and the progress in us all becoming so ubiquitously and permanently connected that we don't need this any more…”
The idea of data management support attracted 4 votes, none negative, and no comments. I’m not sure how to interpret this… but it was not the core interest of the group, I guess. I continue to think it an important missing service that falls into repository-space.

I split authoring support into two ideas: publishing and authoring. Of these, “The repository/library should provide support in the publishing process” got 6 votes, none negative. There were some concerns, eg Owen Stephens again “This needs restating
from a user perspective.” He thought there was value in “a system to help streamline the editorial/publication process is fine (if we can persuade academics to use it), but I'm not sure I'd start by building it on the repository - maybe you could, but there are other approaches as well.” In particular, he thought “I see the point about the benefits, but if we are solving problems our end users don't feel need solving then we have a much more difficult job on our hands (and perhaps not substantially different from our current problems with getting researchers to put things into our repository.”

More direct support for authoring was more divisive (5 votes for and 3 against, net 2 votes). In support, Owen Stephens wrote “The availability of research via Open Access would increase if the same systems that provide Open Access also provided, or were integrated with, tools which support the authoring process.” There was a bit of so-what from Ian Stuart: “This is a variation of what Peter Murry-Rust proposed back in '07: google-docs mixed with a CRIS.” [Yes, Ian, explicitly this idea came from some posts by PMR.] And one outright warning from forkel: “my experience is, that feature requests of this sort are exactly the ones which end up in the "users didn't know what they wanted bin" (I'm a developer).”

Finally (almost) I posted the idea that “The repository should be a full OAIS preservation system”. This turned out to be extremely unpopular, with only 2 votes for and 6 against, net -4! I was very disappointed (repositories are surely not for transient stuff that might be here today, gone tomorrow), so re-phrased it as “Repository should aspire to make contents accessible and usable over the medium term”, when again it attracted 2 votes for and this time none against. This was very late in the day, so maybe few people were coming back to vote on new ideas by then. It was clear that part of the problem was the term OAIS; my mistake for assuming in that context that people would have understood the term.

So where does this leave the Research Repository System idea? Not fully intact, I fear, but not shredded, either. Reduce complexity, think services operating on a store, take care to ensure these are services people really want, etc… It’s going to take me a while to work this through, but I will get back to you!


  1. Thanks for the summary Chris. Just on the OAIS question - I'm not sure this is just phrasing - doesn't the different response mean that people are in favour of making things accessible in the medium term, but don't necessarily see the OAIS approach acheiving this?

  2. How so great this his comment is very nice.


Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.