Digital Curation Blog: Subject Repositories

Showing posts with label Subject Repositories. Show all posts

Monday, 30 June 2008

Repository Fringe

Robin Rice passed on this announcement about Edinburgh's newest festival, the Repository Fringe. She writes: "the event is being jointly planned by Edinburgh and Southampton to coincide with the 'preview week' before the opening of the Edinburgh Festival Fringe. In the spirit of the Fringe, it's a kind of an 'unconference' and we're encouraging people to sign up to participate in various ways on the wiki, e.g. by leading a 'performance' of some sort - a soapbox, a group improv, an Audience With, or as part of the Poster Bazaar." She also tells me that Dorothea Salo from Caveat Lector is giving the keynote. Here is the original email...

"Enjoy fresh debate and face-to-face discussion at Edinburgh's newest and entirely unofficial festival - the Repository Fringe! We would like to invite all repository researchers, managers, administrators & developers to come to Edinburgh for a 2-day workshop on 31st July - 1st August.

The event is free to attend - you just need to register and find your own way up to Edinburgh.

The event will be an informal opportunity to:
discuss new and emerging repository issues
generate new ideas and perspectives
show off some new project results
meet up with like-minded repository managers, administrators, librarians and developers
Research data, social networking, bibliographic services or desktop integration? Whatever your interest and problems, come and give a presentation, kickstart some activity or work on a demonstration.

Put this event in your diary and sign-up online to book your place. Visit the wiki to see the latest programme or register your interest.

Online Registration - www.regonline.co.uk/63349_631697J
Accomodation booking - http://www.repositoryfringe.org/accommodation.html
Website http://www.repositoryfringe.org/
Wiki - http://wiki.repositoryfringe.org/

Hope to see you in Edinburgh!

Theo Andrew
Digital Library Section
Library & Collections
The University of Edinburgh
www.era.lib.ed.ac.uk"

Monday, 31 March 2008

Repositories for scientists

Nico Adams in a post to Staudinger's Semantic Molecules has added to the scenarios for repositories for scientists:

"Now today it struck me: a repository should be a place where information is (a) collected, preserved and disseminated (b) semantically enriched, (c) on the basis of the semantic enrichment put into a relationship with other repository content to enable knowledge discovery, collaboration and, ultimately, the creation of knowledge spaces. "

The linkage seems to be the important thing, which current repositories don't do well. He goes on:

"Let’s take a concrete example [CR: scenario, I think] to illustrate what I mean: my institutional repository provides me with a workspace, which I can use in my scientific work everyday. In that workspace, I have my collection of literature (scientific papers, other people’s theses etc.), my scientific data (spectra, chromatograms etc) as well as drafts of my papers that I am working on at the moment. Furthermore, I have the ability to share some of this stuff with my colleagues and also to permanently archive data and information that I don’t require for projects in the future."

He then develops this argument further, making a lot of sense, and concludes:

"Now all of the technologies for this are, in principle, in place: we have DSpace, for example, for storage and dissemination, natural language processing systems such as OSCAR3 or parts of speech taggers for entity recognition, RDF to hold the data, OWL and SWRL for reasoning. And, although the example here, was chemistry specific, the same thing should be doable for any other discipline. As an outsider, it seems to me that what needs to happen now, is for these technologies to converge and integrate. Yes it needs to be done in a subject specific manner. Yes, every department should have an embedded informatician to take care of the data structures that are specific for a particular discipline. But the important thing is to just make the damn data do work!"

I have heard it suggested elsewhere that DSpace may not be up to this kind of linkage, and also the FEDORA is a suitable candidate, but I don't have sufficient experience of the workings of either to be sure. Anyone comment?

Friday, 20 July 2007

Subject "versus" institutional repositories

There's a concept in maths called "closed but unbounded". I'm not sure it's exactly to the point (I hope that's a pun), but "subjects" seem a bit like that. You can be pretty sure about most of the stuff that's not in a subject (or "domain"), and most of the stuff that is in it, but you can be very puzzled about some of the edges, and can find yourself in some extremely surprising discussions at times about parts of subjects that challenge most of the ideas you had. So subjects turn out to be very un-bounded. (They also tend to fracture, productively.) Perhaps not surprisingly, subjects don't tend to have assets, bank balances, etc. You might say, in those senses, subjects don't exist! They do nevertheless have very real approaches, common standards, ontologies, methods, vocabularies, literatures... and passionate adherents spread across institutions.

Institutions on the other hand, or at least universities, tend to be very material. They do have assets, bank balances, policies, libraries, employees, continuity on a significant scale, even (in the US at least) endowments. They have temporal stability and mass. They collect scholars and scientists in various domains... even if the scientists give their loyalty to their subjects, and are held together only by salaries and a common loathing of the university car parking policy!

Institutions have continuity, and they have libraries, and archives, which in serious ways express that continuity. Libraries are not about print. Libraries are now squarely about knowledge and information expressed in data, whether they know it or not. And the continuity of valuable data is an important reason for libraries to be involved.

But institutions are generic, and libraries are generic, even in more focused institutions like MIT. The library, the archive, the IR, in different ways, are about collecting elements of the scholarly discourse that contribute both globally and locally. So institutional repositories are about generic continuity of data, as libraries are about continuity of collections. IRs create value for the institution, even if it is only a small piece of value (like most other individual "collections" in an institution). If you don't play, you aren't in the game. You know data has value, just not which bits. You need to disclose your scholarly assets, across the spectrum; you can feel proud of doing so, and make a case for local benefit at the same time. You are an institution taking part in a global system; the value may be in the network, but you are part of that network.

But the way an IR treats data is necessarily generic; if you get data from chemistry, engineering, social sciences and performing arts into this under-funded but potentially valuable repository, you will do your best but it will necessarily be variants of generic practice, at best.

So back to the "subject"; if there is a data repository here, it is likely staffed by "domain experts", capable of taking on a "community proxy" role. They know their stuff. They will treat their data in domain-specific ways; they will know where to seek out data to complement their collection, they will know how to make connections between different parts. They can describe it appropriately, they can develop standards with their colleagues. They will know how to help their colleague scientists extract maximum value. Some subject repository managers are seriously concerned about the problems for disciplines if institutional repositories expand into the data "space".

What subject repositories don't usually have, is what institutions have: substantial assets, endowments, bank balances, tenured staff. Usually based around multiple project grants, 5-year core funding is a prized goal at a price of cheese-paring funding, and mid-term reviews every second year. Subject repositories don't have assured continuity, temporal mass.

The NSB LLDDC (Long-lived digital data collections) report, and now the NSF CyberInfrastructure strategy, are aimed at this area; they have spotted the fragility of these subject data collections. In the UK we have possibly even more of a patchwork of funding mechanisms than was observed in the LLDDC report. JISC used to be a significant funder of subject repositories, but in recent years has been retrenching from them, while building up massive funding in IRs. AHRC, as we have seen, is pulling back from funding the AHDS.

So what would make this better? I'd like to see a substantive discussion about the roles and funding mechanisms of subject and institutional repositories. In the UK, this would have to involve at least the Research Councils, Wellcome Trust and JISC. (Perhaps looks less likely than it did when I first wrote this.)

Secondly, I'd like to see JISC in the final tranche of its capital funding (here's the circular recently closed) explore the bounds of what's possible with the data provider/ service provider combination (maybe OAI/ORE will address this a little? Maybe not!). And what if curation is detached from the repository? What if data continuity/preservation is separated from the curation service? Do these questions even make sense?

Maybe a system or federation of sustainable IRs internally divided into sets on subject lines (and hence externally aggregatable along those lines), with subject-oriented curation activities picking up on "invisible college" volunteerism might work? Splitting curation into generic and domain elements... Or other notions, pushing the skill out into the network, the federation, but retaining the data where the assets and continuity lie?

[This posting is based on an email I sent to a closed JISC Repositories advisory group some time ago; it seems even more relevant today...]

Digital Curation Blog

Monday, 30 June 2008

Repository Fringe

Monday, 31 March 2008

Repositories for scientists

Friday, 20 July 2007

Subject "versus" institutional repositories

Creative Commons

Blog Archive

Contributors

Labels