I wrote about RDF-encoding contact information a little earlier and had some very helpful comments. On reflection, and after exploring the “View Source” options for a couple of institutional contact pages, I’ve had some further thoughts.
- Contacts pages are rarely authored, they are nearly always created on the fly from an underlying database. This makes them natural for expressing in RDF (or microformats). It’s just a question of tweaking the way the HTML wrapper is assembled. Bath University’s Person Finder pages do encode their data in microformats.
- I wondered why more universities don’t encode their data in microformats or (even better) in RDF for Linked Data. One possible answer is that the contact pages were probably one of the earliest examples of constructing web pages from databases. It works, it ain’t broke, so they haven’t needed to fix it! If so, a reasonable case would need to be made for any change, but once made it would be comparatively cheap to carry out.
- A second problem is that it is not at all clear to me what the best encoding and vocabulary for institutional (or organisational unit) contact pages might be. So maybe it’s even less surprising that things have not changed. To say I'm confused is putting it mildly! So what follows list some of the options after further (but perhaps not complete) investigation...
One approach is the hCard microformat, based on the widely used vCard specification, RFC2426 (this is what Bath uses). That’s fine as far as it goes, but microformats don’t seem to fit directly in the Linked Data world. I’m no expect (clearly!), but in particular, microformats don’t use URIs for the names of things, and don’t use RDF. They appear useful for extracting information from a web page, but not much beyond that (I guess I stand to be corrected here!).
Looking at RDF-based encodings, there are options based on vCard, there are FOAF and SIOC (both really coming from a social networking view point), and there’s the Portable Contacts specification.
Given that vCard is a standard for contact information, it would seem sensible to look for a vCard encoding in RDF. It turns out that there are two RDF encodings of vCard, one supposedly deprecated, and the other apparently unchanged since 2006. I now discover an activity to formalise a W3C approach in this area, with a draft submission to W3C edited by Renato Ianella and dating only from last December (2009), but I would need a W3C username and password to see the latest version, so I can't tell how it's going,
Someone asked me a while ago who sets the standards for Linked Data vocabularies. My response at the time was that the users did, by choosing which specification to adopt. At the time, FOAF seemed to have most mentions in this general area, and I rather assumed (see the previous post) that it would have the appropriate elements. However, the “Friend of a Friend” angle really does seem to dominate; this vocabulary does seem to be more about relationships, and to be lacking in some of the elements needed for a contacts page. I suspect this might have stemmed from a desire to stop people compromising their privacy in a spam-laden world. However, those of us in public service posts often need to expose our contact details. However, FOAF does have email as foaf:mbox, which apparently includes phone and fax as well, as you can see from the sample FOAF extract in my earlier post.
In a tweet Dan Brickley suggested: “We'll probably round out FOAF’s address book coverage to align with Portable Contacts spec”, so I had a look at the latter. The main web site didn’t answer, but Google’s cache provided me with a draft spec, which does appear to have the elements I need.
What elements do I need for a contact page? Roughly I would want some or all of:
- Name
- Job title/role in DCC (my virtual organisation)
- (Optional job title/role in home organisation)
- Organisational unit/Organisation
- Address/location
- Phone/fax numbers
- Email address
So what could I do if this information were expressed in RDF in the contact pages for a partner institution (say UKOLN at Bath)? Well, presumably the DCC contact pages would be based on a database showing the staff who work on the DCC, with the contact information directly extracted from the remote pages (either linked in real time or perhaps cached in some way). And if Bath changed their telephone numbers again, our contact details would remain up to date. But more. Given that there are some staff members who have roles in several projects, it would be easy to see who the linkages were between the DCC and the other project (eg RSP in the past, or I2S2 now). Part of the point of Linked Data (rather than microformats) is that one can reason with it; follow the edges of the great global graph…
And perhaps I would be able to find a simple app that extracts a vCard from the contact page to import into my Mac’s Address Book, which is where I started this search from! You wouldn’t think it would be hard, would you? I mean, this isn’t rocket science, surely?
You could use the contact ontology:
ReplyDeletehttp://www.w3.org/2000/10/swap/pim/contact#
It gives the relations to describe offices and places and their fixed phone numbers. What is a bit missing still is a relation between a person and the organisation he is working for.
(see my foaf for usage:
http://bblfish.net/people/henry/card )
(by the way your openid authentication does not seem to work)
Chris
ReplyDeleteA developer friend (an historian) reckons that there are some Technorati tools that would do what you want re extracting vCard data. They are at:
hCard - http://h2vx.com/vcf/
hCalendar - http://h2vx.com/ics/
I haven't looked at them but my friend thinks they would do what you are asking for.
Sorry - the second one of these does the same for calendar events that are embedded in web pages.
ReplyDeleteA