Tuesday, 10 June 2008

Negative click repositories

I wanted to write a bit more about this idea of a negative click repository (negative cost was a bad name, as there is a real positive $ and £ cost to the repository, rather than the depositor). First some ancient history...

When I joined the University of Glasgow in 2000, the Archives and Business Records Centre with other collaborators within the University were near the end of a short project on Effective Records Management (ERM, http://www.gla.ac.uk/infostrat/ERM/). During the course of that project, they surveyed committee clerks (who create many authoritative institutional records) on how much effort they were willing to put in, how many clicks they were willing to invest, to create records that would be easily maintainable in the digital era. The answer was: zero, none, nada! Rather than give up at this point, the team went on to create CDocS (http://www.gla.ac.uk/infostrat/ERM/Docs/ERM-Appendix2.pdf), an instrumented addition to MS Word that allowed the committee clerks to create their documents in university standard forms, with agreed metadata, with the documents and metadata automatically converted into XML for preservation and to HTML for display and sharing. ICE (see below) might be a contemporary system of a related kind, in a slightly different area. Thanks to James Currall for updating me on ERM and CDocS.

In April 2007, Peter Murray Rust had an epiphany thinking about repositories on the road to Colorado, realising that SourceForge was a shared repository that he had been using for years, and speculating that it might be used for writing an article. The tool for control of managing versions and sharing in SourceForge is SVN… Peter wrote about the complex workflow in writing a collaborative article, but then wrote:
“BUT using SVN it’s trivial - assuming there is a repository. So we do not speak of an Institutional Repository, but an authoring support environment (ASE or any other meaningless acronym. ) A starts a project in institutional SVN. B joins, so do C, D, E, etc. They all edit the m/s. Everyone sees the latest version. The version sent to the publisher is annotated as such (this is trivial). All subsequent stuff is tracked automatically. When the paper is published, the institution simply liberates the authorised version - the authors don’t even need to be involved. The attractive point of this - over simple deposition - is that the repository supports the whole authoring process.”
Many of those who left comments disagreed that the technology would work directly as suggested, for various reasons. Google Docs was mentioned as an alternative (still flawed). Peter Sefton mentioned ICE (and Murray Rust subsequently visited USQ to work briefly with the ICE team.

You may also remember Caveat Lector’s series of personae representing stakeholders in the repository game at fictional Achaea University, that I reported on before. Ulysses Acqua was her repository manager, and here’s a quote from his attempts to explain the advantages of his repository to faculty; they ask:
“Can it produce CVs or departmental-activity reports automatically? No. Can it be tweaked so that the Basketology collection looks like the Basketology website? No. (The software can do that, in fact, but Ulysses can’t.) Can it talk to campus IT’s file-storage-cum-website servers? No. Can it harvest faculty articles from disciplinary repositories? No. Can it deliver records straight to Achaea Library’s catalogue? No. Can it have access controls per item, such that items are shared with specific people only, with the list controlled by the depositor? No. Can it embargo items, for a certain length of time or indefinitely? No. Can it read a citation, check rights on the journal, and go fetch the paper if rights are cleared? Dream on. Can it restrict items for campus-only access by IP address? No. Does it talk to RefWorks and Zotero and similar bibliographic managers? No. Does it do version control? No.”
The problem here is that the repository adds work, it doesn’t take it away (there are other examples of this in some of the other personae). And overloaded people don’t accept extra work. They may promise to, but they (mostly) don’t do it.

Finally, I posted earlier on Nico Adam’s comments on repositories for scientists. He got stuck, he said: "I had to explain what a repository is from the point of view of functionality - when talking to non-specialist managers, it is the only way one can sell and explain these things…they do not care about the technological perspective…the stuff that’s under the hood. I found it impossible to explain what the value proposal of a repository is and how it differentiates itself in terms of its functionality from, say, the document management systems in their respective companies." It’s worth repeating a couple of extracts from his conclusions:
"Now today it struck me: a repository should be a place where information is (a) collected, preserved and disseminated (b) semantically enriched, (c) on the basis of the semantic enrichment put into a relationship with other repository content to enable knowledge discovery, collaboration and, ultimately, the creation of knowledge spaces. "
And further:
"Now all of the technologies for this are, in principle, in place: we have DSpace, for example, for storage and dissemination, natural language processing systems such as OSCAR3 or parts of speech taggers for entity recognition, RDF to hold the data, OWL and SWRL for reasoning. And, although the example here, was chemistry specific, the same thing should be doable for any other discipline. … Yes it needs to be done in a subject specific manner. Yes, every department should have an embedded informatician to take care of the data structures that are specific for a particular discipline. But the important thing is to just make the damn data do work!"
So we need to develop repositories that make the data work to take human work away: negative click repositories. Or maybe not… (sorry about this). I’m just a bit concerned that we might get a set of monumental edifices built on DSpace or ePrints.org foundations, resulting in “take it or leave” it decisions. Institutional information environments are highly tailored (not always carefully) and at the department level even more eclectic, and things have to fit together. Maybe as Nico was suggesting, what we need is an array of tools, connected together by technologies like Atom/RSS and/or OAI-ORE that can be configured so as to link the components into an information management system that works to reduce the publishing effort on campus, and captures the intellectual product on the way.

People appear to have lots of ideas on what negative click repositories might do. We could have tools for supporting the scientist in their information sharing (web sites, bibliographies and CVs). Tools for shared data management. Tools for shared writing. Tools even for the library to support faculty in dealing with publishers. And of course tools to help management count their beans. I’d like to begin to collect and order these and other suggestions if possible, so please leave comments or tag your blogs with “negative click”…


  1. Three cheers! Huzzah! Yes!

  2. We have also recently come to similar conclusions at Imperial - we need to offer value back to the researcher for depositing that goes beyond the 'open access' argument.

    We recently had an example of self-deposit by an academic of around 35 articles - none of which could be approved because of copyright issues. The result of this is the researcher has spent valuable time and got absolutely no reward - nothing. How likely are they to come back and try again?

    We clearly need to look at ways of adding value (for the researcher) to anything deposited, as well as minimising the barriers to deposit (perhaps following up on the suggestion of integrating 'repositories' into the authoring process)

    A couple of quotes at the Talis Xiphos Research Day yesterday were (I paraphrase):

    Academic: "ah, so repositories are about making content available on the web. This is what I was missing - why I didn't 'get' repositories"

    Peter Murray Rust: "Deposit is trivial if authored appropriately"

  3. Owen, thanks for the comment. I hope you can hold your academics contributions "dark" pending copyright resolution. Policies do tend to change. One option is to ask the researcher to go public on the absurdity... might work, but might be too big an ask, perhaps.

    I'm going to try to think a bit more about what kinds of things a negative click repository system might do, and try to list them on the blog. As you suggest, other ways of adding value might help. One might be simply converting source word processing files to TAGGED PDF, for example (on the theory that it's easier to data-mine...).

  4. We're in the process of developing a digital repository (previous experiments here mostly being "pilot" programs with no institutional support), and one of the people involved in the process has argued (persuasively to me, at least) that the repository ought to do exactly this - integrate with business-as-usual for our faculty, and not require them to invest extra time or "clicks" in it.

    We tried to come up with some way to identify what it was, and eventually settled on the phrase "repository-as-scholarly workbench." It's not perfect, but it worked for us.

    However, I still wonder how such tools could be instituted. Forgive me if I'm misunderstanding the concept, but if people are miserly with their clicks (I know I am) then asking for any change to their normal workflow is going to be tough - especially for faculty who have been perfectly successful using their own idiosyncratic workflows.

  5. I'm very interested in ways to integrate deposit with the normal workflow of authors. One thought would be to set up a virtual drive and mount it throughout campus (and give authors instructions on how to mount it from home). Then, when the author is done composing the article, they could simply save a copy to their folder on the virtual drive. Whatever was saved in these folders would be queued for deposit in the IR. (This is already how many people use FTP/SSH to upload files to their Web space.)

    Thinking ahead, could you set up autosave in MS Word, OpenOffice, etc. to save each revision to the virtual drive? This would completely integrate deposit in the workflow, requiring no additional steps for deposit, but would need the author to approve the "final" draft for public access.

  6. What this still relies on is that anyone at the university outside of the group that sets it up and maintains it will: even have an initial awareness of the repository; an understanding of the benefits; be in agreement that the service is necessary; not rebel if their hands are forced to comply with an institutional mandate.

    I function at a university where these concerns are far too founded, and I have found no solutions to make them less so.


Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.