Tuesday, 14 October 2008

ARROW Repositories day: 1

I’ve been giving a talk about the Research Repository System ideas at the ARROW repository day in Brisbane, Australia (which is partly why there has been a gap in posting recently). Here are some notes on the other talks.

Kate Blake from ARROW is talking about metadata. Particularly important for data, which cannot speak for itself. Metadata thought of as a compound object that comprises some parts for “library management” issues (things like author, title, keyword) for the whole document and/or its parts, plus University management parts, such as evidence records for research quality management purposes. These link to metadata that applies to the community of practice, eg the appropriate metadata for an X-ray image. Have the content (maybe a PDF), its rich metadata (Kate used MARC/XML as an example, which surprised me, since she also suggested this group was specific to the content), lightweight descriptive metadata, technical metadata (file size, type etc), administrative metadata, eg rights or other kinds of institutional metadata, preservation metadata such as PREMIS, and both internal and external relationship metadata. METS is one way to wrap this complex set of metadata and provide a structural map (there are others). (Worrying that this seems like a very large quantity of metadata for one little object…) Aha, she’s pointing out that aggregating these into repositories and these repositories together across search services leads ot problems of duplication, inconsistency, waste of effort, etc. So lots of work trying to unravel this knot, lots of acronyms: RDF, FRBR, SWAP, DCMI AM, SKOS etc…

FRBR making the distinction between the work, its expressions, manifestations and items. SWAP being a profile for scholarly works, as text works, not much use for data.

Names appear in multiple places in the metadata, and different parts have different rules. Do we have a name agent service (registries)? Need services to gather metadata automatically, that way you might introduce consistency and interoperability.

Kylie Pappalardo from QUT’s OAK Law project on legal issues on managing research data so it can be included in a repository and accessed by others. Government statements in favour of openness (eg Carr: More than one way to innovate, also “Venturous Australia” strategy). To implement these policies we need changes to practice and culture, institutional engagement, legal issues being addressed, etc. Data surrounded by law (!): copyright, contract, patents, policies, confidentiality, privacy, moral rights. Conflicting legal rights: who can do what with the data? QUT has OAK Law and also Legal Framework for e-Research project.

Survey online, May 2007, 176 participants responded. 50 depositing data in database; of those 46% said available openly, 46% required some or complete restrictions. 54% said their organisation did NOT have a data policy at all; where they did have a policy, most were give guidelines. 55% said they prepared plans for data management; two thirds of these at time of proposal, balance later. Should be early, not least because data management costs and should be part of the proposal, also disputes can be hard to resolve later. 7% felt that clearer info on sharing and re-use would help, and 90% wanted a “plain English” guide (who wouldn’t?). Lawyer language doesn’t help, so researchers make their own informal agreements… maybe OK if nothing goes wrong.

The group has a report: analysis of Legal Context of Infrastructure for Data Access and Re-use in Collaborative Research. Also Practical Data Management: a Legal and Policy Guide. They have some tools, including a “simple” Data Management (legal) toolkit to fill in to gather information about (eg) copyright ownership etc.

Peter Sefton of USQ talking about OAI-ORE, and what it can do for us. Making the point that we build things from a wide variety of standard components, which (mostly) work pretty well together, eg found bricks in a garden wall… OAI-PMH mostly works, moving metadata from one place to another. But it’s just the messenger, not the message. So a harvest of metadata across multiple repositories shows wide variations in the keywords, subjects etc. Problems with XACML for defining access policies: no standardisation on the names of subgroups, so in the end it’s no use for search. Point being that these standards may appear important but not work well in practice.

So on to ORE (Object Re-use and Exchange)… Pete asks “shouldn’t it be exchange, then re-use?”. ORE view of a blog post: a resource map describes the post, but in fact it’s an aggregation (compound object) of HTML text, a couple of images, comments in separate HTML, etc. The aggregation does have a URI, but does not have a fetchable reality (the resource map does). Can get complex very rapidly. See Repository Challenge at OpenRepositories 2008 in Southampton, and the ORE Challenge at RepoCamp 2008. USQ participating with Cambridge in JISC-funded TheOREM project, also internal image project called The Fascinator. Based on his ICE system that has been mentioned before, integrated with ORE tools to push stuff into the repository. Can have the repository watching so it can get it for itself.

ORE can: supplement OAI-PMH for moving content around; improve research tols like Zotero; replace use of METS packages; allow “thesis by publication” more elegantly; and pave the way for a repository architecture that understands content models (no more discussion of atomistic versus compound objects).

1 comment:

  1. Notes from my talk here:



Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.