Sunday 15 June 2008

Reaction to Negative Click Repositories

I was very pleased with reactions to the Negative Click Repositories post. I’ll come back to the idea itself in a later post, but this just attempts to gather some of the comments together (the first few are comments to the original post, but I know my blogreader doesn’t expose those easily).

Owen from Imperial writes
“We clearly need to look at ways of adding value (for the researcher) to anything deposited, as well as minimising the barriers to deposit (perhaps following up on the suggestion of integrating 'repositories' into the authoring process).”
Chris Clouser reported they had been struggling with similar ideas:
“We tried to come up with some way to identify what it was, and eventually settled on the phrase "repository-as-scholarly-workbench." It's not perfect, but it worked for us.”
Gavin Baker wrote:
“I'm very interested in ways to integrate deposit with the normal workflow of authors. One thought would be to set up a virtual drive and mount it throughout campus (and give authors instructions on how to mount it from home). Then, when the author is done composing the article, they could simply save a copy to their folder on the virtual drive. Whatever was saved in these folders would be queued for deposit in the IR. (This is already how many people use FTP/SSH to upload files to their Web space.)

"Thinking ahead, could you set up autosave in MS Word, OpenOffice, etc. to save each revision to the virtual drive? This would completely integrate deposit in the workflow, requiring no additional steps for deposit, but would need the author to approve the "final" draft for public access.”
Moving to blog reactions, Pete Sefton reported on the OR08 work on “zero click deposit” (which I should have given some credit to). He suggests (perhaps tongue in cheek) that I’m unfair in asking for more. He also linked back to earlier thoughts on Workflow 2.0:
“But in a Workflow 2.0 system the IR might not need to have a web ingest. Instead there could be an ingest rule that would fire when a certain context was detected, such as a document attaining a certain state of development, which would be reflected in it's metadata. That's the 'data as the driving force'... In this case it's metadata that's doing the driving.”
Next Pete talked about his participation in a JISC project (with Cambridge and some others):
“In TheOREM we’re going to set up ICE as a ‘Thesis Management System’ where a candidate can work on a thesis which is a true mashup of data and document aka a datument..⁠. When it’s done and the candidate is rubber-stamped with a big PhD, the Thesis management system will flag that, and the thesis will flow off to the relevant IR and subject repositories, as a fully-fledged part of the semantic web, thanks to embedded semantics and links to data.”

“One more idea: consider repository ingest via Zotero or EndNote. In writing this post I just went to the OR08 repository and used Zotero to grab our paper from there so I could cite it here. It would be cool to push it to the USQ ePrints with a single click.”
Then under the heading of “Net-benefit repository workflows” he cautions
“I think it is a mistake to focus on the Negative Click repository as a goal and assume that it’s going to happen without changing user behavior…
If we can seed the change by empowering some researchers to perform much better – to more easily write up their research, to begin to embed data and visualizations, to create semantic-web-ready documents – then their colleagues will notice and make the change too.”
Les Carr had (as usual) some sensible remarks:
“I think I would rather talk about value or profit - the final outcome when you take the costs and benefits into consideration. Do you run a positive value repository? Is it frankly worth the effort? Are your users in scholarly profit, or are you a burden on their already overtaxed resources?”
Les was a bit worried I had taken the Ulysses Acqua scenario too literally. Don't worry, Les, I do get it as fiction, just a particularly apposite one in many cases, if not for your repository. Les concludes
“Anyway, I think that I am in violent agreement with Chris, so to show solidarity I will do what he asked and list some positive value generators: publicity and profile (CVs, Web pages, displays, adverts for MScs/PhDs/staff), community discovery, laptop backup and asset management.”
Thanks, Les.

Meanwhile Chris on the Logical Operator has a long, thoughtful post. Go read it! Of the “negative click repository idea, he writes:
“It’s a laudable goal, but I’m not sure how one would, in practice, implement such a thing, since even the most transparent approach has a direct impact on workflow (the exact thing that most potential contributors appear not to want to mess with - they don’t want to change the way they work). The challenge is to create something that:

* Integrates with any workflow
* Involves no (or essentially no) extra activities on the part of the scholar
* Uses the data structure of the material to create the interrelations critical to really opening access
* Provides a visible and obvious value-add that can be explained without looking under the hood
* Is not just a wrapper on some existing tool”
Chris also likes the idea of the “scholarly workbench” approach, and thinks this might include:
“Storage: the system (or aggregate) would need to store copies of the relevant digital object in such a way that it could be reliably retrieved by any user. Most any system - Writeboard, SlideShare, even wiki software have this capability.

Interoperation: the system would need to allow other systems to locate said material by hooks into the system. Many W2.0 applications allow access directly to their content via syndication.

Metadata: tagging is nearly universal across W2.0 applications - it’s how people find things, the classic “tag-based folksonomy” approach.

Sharing: again, this is the entire point of most W2.0 applications. It’s not fun if you can’t share stuff. Community is the goal of most of the common web applications.

Ease of use: typically, Web 2.0 applications are designed to get you up and running quickly. You do very little to establish your account, and there are no complex permissions or technical fiddling to be done.”

Last but one (for this post, anyway), Nico Adams has some more thoughts on Staudinger’s Molecules. Again, the whole post is worth a read; I have not yet looked at the piece of technology he is playing with:
“I have been wondering for a while now, whether the darling of the semantic web community, namely Radar Network’s Twine, could not be a good model for at least some of the functionality that an institutional repo should/could have. It takes a little effort to explain what Twine is, but if you were to press me for the elevator pitch, then it would probably be fair to say that Twine is to interests and content, what Facebook is to social relationships and LinkedIn to professional relationships.”
There’s a lot more about Twine, including some screen shots, so go have a read.

Finally, a post that was not a reaction, but might have been. I should have given credit to Andy Powell, who has been questioning the current implementations of repositories (despite being joint author of the JISC Repositories Roadmap [.doc]). Andy wants repositories to be more consistent with the web architecture. He spoke at a Talis workshop recently; his slides are here (on Slideshare, one of his models for a repository). Cameron Neylon on Science in the Open was also at the event, and blogged about Andy’s talk. He also wrote:
“But the key thing is that all of this [work to deposit]should be done automatically and must not require intervention by the author. Nothing drives me up the wall more than having to put the same set of data into two subtly different systems more than once. And as far as I can see there is no need to do so. Aggregate my content automatically, wrap it up and put it in the repository, but I don’t want to have to deal with it. Even in the case of peer reviewed papers it ought to be feasible to pull down the vast majority of the metadata required. Indeed, even for toll access publishers, everything except the appropriate version of the paper. Send me a polite automated email and ask me to attach that and reply. Job done.

For this to really work we need to take an extra step in the tools available. We need to move beyond files that are simply ‘born digital’ because these files are in many ways still born. This current blog post, written in Word on the train is a good example. The laptop doesn’t really know who I am, it probably doesn’t know where I am, and it has not context for the particular word document I’m working on. When I plug this into the Wordpress interface at OpenWetWare all of this changes. The system knows who I am (and could do that through OpenID). It knows what I am doing (writing a Blog post) and the Zemanta Firefox plug in does much better than that, suggesting tags, links, pictures and keywords.

Plugins and online authoring tools really have the potential to automatically generate those last pieces of metadata that aren’t already there. When the semantics comes baked in then the semantic web will fly and the metadata that everyone knows they want, but can’t be bothered putting in, will be available and re-useable, along with the content. When documents are not only born digital but born on and for the web then the repositories will have probably still need to trawl and aggregate. But they won’t have to worry me about it. And then I will be a happy depositor.”
Well Cameron, we want your content! So overall, there’s something here. I like the idea of positive value better than negative clicks, but I haven’t got a neat handle on it yet. Scholar’s workbench doesn’t really thrill me; I see what they mean, but the term is too broad, and scientists don’t always respond to the scholar term. I’ll keep thinking!

0 comments:

Post a Comment

Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.