Monday, 31 March 2008

Repositories for scientists

Nico Adams in a post to Staudinger's Semantic Molecules has added to the scenarios for repositories for scientists:
"Now today it struck me: a repository should be a place where information is (a) collected, preserved and disseminated (b) semantically enriched, (c) on the basis of the semantic enrichment put into a relationship with other repository content to enable knowledge discovery, collaboration and, ultimately, the creation of knowledge spaces. "
The linkage seems to be the important thing, which current repositories don't do well. He goes on:
"Let’s take a concrete example [CR: scenario, I think] to illustrate what I mean: my institutional repository provides me with a workspace, which I can use in my scientific work everyday. In that workspace, I have my collection of literature (scientific papers, other people’s theses etc.), my scientific data (spectra, chromatograms etc) as well as drafts of my papers that I am working on at the moment. Furthermore, I have the ability to share some of this stuff with my colleagues and also to permanently archive data and information that I don’t require for projects in the future."
He then develops this argument further, making a lot of sense, and concludes:
"Now all of the technologies for this are, in principle, in place: we have DSpace, for example, for storage and dissemination, natural language processing systems such as OSCAR3 or parts of speech taggers for entity recognition, RDF to hold the data, OWL and SWRL for reasoning. And, although the example here, was chemistry specific, the same thing should be doable for any other discipline. As an outsider, it seems to me that what needs to happen now, is for these technologies to converge and integrate. Yes it needs to be done in a subject specific manner. Yes, every department should have an embedded informatician to take care of the data structures that are specific for a particular discipline. But the important thing is to just make the damn data do work!"
I have heard it suggested elsewhere that DSpace may not be up to this kind of linkage, and also the FEDORA is a suitable candidate, but I don't have sufficient experience of the workings of either to be sure. Anyone comment?

1 comment:

  1. Hi Chris,

    Just left a couple of links to some very different lifecycle models on the DDC "Lifecycle" forum.

    "what needs to happen now, is for these technologies to converge and integrate. Yes it needs to be done in a subject specific manner". That's the HUB of it.

    The trouble is that every discipline prefers it's own technologies (and formats), so expecting technologies to converge and integrate on a 'one size fits all' basis is just a furphy.

    I think you'll find that if you spend less time worrying about the semantic data of institutional repositories and more time thinking more about how to classify the (very different, interactive) "knowledge spaces", then the subject specific communities might be able to figure out when and where they want to link, or not.

    Here's a community index in the space to consider.

    Now if we could just find "an embedded informatician to take care of the(ir) (community, not departmental) data structures". Of course that presupposes an informatician understands how to integrate ALL the different Communities' applications. (or even just a few). regards.


Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.