Thursday, 7 August 2008

Repositories and the CRIS

As I mentioned in the previous post, there has been some discussion in the JISC Repositories task force about the relationship between repositories and Current Research Information Systems (CRIS). Stuart Lewis asserted, for example, that “Examples of well-populated repositories such as TCD (Dublin) and Imperial College are backed by CRISs.” So it seems worth while to look at the CRIS with repositories in mind.

There has been quite a bit of European funding of the CRIS concept; not surprising perhaps, as research funders would be significant beneficiaries of the standardisation of information that could result. An organisation (EuroCRIS) has been created, and has generated several versions of a data model and interchange standard (which it describes as the EC-recommended standard CERIF: Common European Research Information Format), of which the current public version is known as CERIF 2006 V1.1. CERIF 2008 is under development, and a model summary is accessible; no doubt many details are different, but it does not appear to have radically changed, so perhaps CERIF is stabilising. Parts of the model are openly available at the EuroCRIS web site, but other parts require membership of EuroCRIS before they are made available. This blog post is only based on part of the publicly accessible information.

I decided to have a look at the CERIF 2006 V1.1 Full Data Model (EuroCRIS, 2007) document, with a general aim of seeing how helpful it might be for repositories. Note this is not in any sense a cross-walk between the CERIF standards and those applicable to repository metadata.

Quoting from a summary description of the model:
“The core CERIF entities are Person, OrganisationUnit, ResultPublication and Project. Figure 1 shows the core entities and their recursive and linking relationhips in abstract view. Each core entity recursively links to itself and moreover has relationships with other core entities. The core CERIF entities represent scientific actors and their main research activities.
Figure 1: CERIF Core Entities in Abstract View”

The loops on each entity in the diagram above represent the recursive relationships, so a publication – publication relationship might represent a citation, or one of a series, or a revision, etc. Entity identifiers all start cf, which might help make the following extract more understandable:
“The core CERIF entities represent scientific actors (Persons and Organisations) and their main research activities (Projects and Publications): Scientists collaborate (cfPers_Pers), are involved in projects (cfProj_Pers), are affiliated with organisations (cfOrgUnit_Pers) and publish papers (cfPers_ResPubl). Projects involve people (cfProj_Pers) and organisations (cfProj_OrgUnit). Scientific publications are published by organisations (cfOrgUnit_ResPubl) and refer to projects (cfProj_ResPubl), publications involve people (cfPers_ResPubl), organisations store publications (cfResPubl), support or participate in projects (cfProj_OrgUnit), and employ people (cfPers_OrgUnit). To manage type and role definitions, references to classification schemes (cfClassId; cfClassSchemeId) are employed that will be explained separately…”
The model adds other “second-level” entities:
Figure 5: CERIF 2nd Level Entities in Abstract View

As Ian Stuart wrote in the repositories discussion:
“The current repository architecture we have is a full deposit for each item (though some Repositories jump round this with some clever footwork) - which runs straight into the "keystroke" problem various people have identified.

With a CRIS-like architecture, the user enters small amounts of [meta-]data, relevant to the particular thing - be it a research grant; a presentation at a conference; a work-in-progress; a finished article; whatever.... and links them together. It is these links that then allows fuller metadata records to be assembled.”
It should be clear that having this much information available should make populating a repository relatively trivial task. It's not entirely clear how easily the metadata will transfer, although it does appear that Dublin Core metadata has been used in the model, but whether this is mandatory or optional is not clear from this document. I suppose it’s also not clear how much the CERIF standard is used in actual implementations.

I’ll leave the penultimate word to Stuart Lewis, who wrote
“[The CRIS] doesn't replace the repository, but offers a management tool for research. Inputs to the CRIS can include grant record systems, student record systems, MIS systems etc, and outputs can include publication lists and repositories… Now we just need an open source CRIS platform... E.g.:”

Note, I have not checked out any of the above products in any detail, although the 3rd one appears to be a non-responsive web-site; use at your own risk! And it should be obvious that anything that links into MIS systems in any institution is likely to require a major implementation effort. It may also suffer from institutional silo effects (links to my institutions management information, but not my collaborators’ information).


EuroCRIS. (2007). CERIF2006-1.1 Full Data Model (FDM) Model Introduction and Specification (Recommended standard).


Post a Comment

Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.