Wednesday 13 January 2010

Persistence of domain names

I had a chat before Christmas with Henry Thompson, who works both in Edinburgh Informatics and also on the W3C TAG. Insofar as the Internet is important in sustaining long term access to information in digital form, there is a sustainability problem that we rather seem to have ignored. Everything on the Internet (literally) depends on domain names, and these are only ever rented. There is no mechanism for permanently reserving a domain name. Domain names can be lost by mistake (overlooking a bill, perhaps having moved in the interim and not informed the relevant domain name registrar), but they can also be lost on business failure. Although domain names can be a business asset, I understand that the registrars have some discretion on transfers, and in particular one cannot make a "domain name will" seeking transfer of the domain name to some benevolent organisation. Note, the mechanism for renting domain names has sustainability advantages, providing sustainability to important services that underpin the DNS.

There are two kinds of problem, one on a massive scale and one more fine-grained. The massive problem is that the entire infrastructure of the Internet depends on URIs, most of which are http URIs that in turn depend on the domain name system. So there are a number of organisations whose domain names are embedded in that infrastructure in a way and to an extent that is very difficult to change. W3C is clearly such an organisation. Many of these organisations seem rather fragile (not a comment on W3C, by the way, although its sustainability model is opaque to me). Should they fail and the domain names disappear, the relevant URIs will cease to work and various pieces of Internet machinery will fall apart.

(By the way, this does seem to be one case where a persistent ID that is independent of the original domain, such as a DOI, has advantages over a HTTP URI plus a redirect table. If the domain name no longer exists, you can't get to a redirect, whereas someone can still relink the DOI to a new location.)

On the more fine-grained scale, many documents (particularly in HTML) are not easily separable from their location, depending on other local files and documents. In addition of course, documents in some sense exist through their citations or bookmarks, that begin to exist separately from the document. Moving a document to a new domain can make it "fail" or disappear. So sustainability is linked to the domain as well as the other preservation factors.

This seems to me to be not at all a technical problem, but it seems to have legal/regulatory, governance, social, business and economic aspects.

Among the solutions might be creating a new top level domain designed for persistence, with different rules of succession, etc. Another (either instead of or in conjunction with the first) might be creating an organisation designed for persistence, to hold endowed domain names. Somehow the ongoing revenue stream for those underpinning services must be retained indefinitely into the future.

We don't think we have the answers, but we do think there is a problem here; I'm not yet sure if we have articulated it accurately at all. I would appreciate any comments. Thanks,

8 comments:

  1. Chris, one approach to identifier persistence is opaqueness, as exemplified by DOIs and such. Another, less well known approach is timestamping, as exemplified by tag URIs (http://tools.ietf.org/html/rfc4151). A tag URI combines a domain name with a timestamp. For example, the URI tag:dcc.ac.uk,2010:someidentifier... refers to something created by whoever owned the dcc.ac.uk domain in 2010. This doesn't address persistence of resources or domain names at all, of course, but the approach does at least provide a technical means of avoiding namespace and identifer collision. -Greg Janee

    ReplyDelete
  2. Greg, Happy New Year (I'm guessing you guys are getting better weather than we are in the UK at the moment!). It's not so much the identifier opaqueness that I meant to comment on in respect of DOIs, but the fact that the identifier is independent of the domain of the target identifier. So if we lost control of dcc.ac.uk, but still had the resources, we could rehost at newdcc.ac.uk, but we could not put a redirect table at dcc.ac.uk. However, we could re-point the DOIs to the new location.

    John Kunze has a nice slide somewhere showing purported advantages of DOIs, which goes on to show that they are no better than HTTP URIs for those features. This is the first advantage I can think of that the simple, well-managed HTTP URI can't match.

    Of course, someone has to keep running doi.org or its equivalent, too!

    Chris

    ReplyDelete
  3. Hi Chris,

    I'm wondering: why do we need to preserve domain names?

    As long as archives are preserving whole websites with time/date stamps associated with them, then we have a record of which domain name was associated with which website at any time/date in the repository.

    Of course there is a huge issue with maintaining functional and useful access to websites overtime, including how you would possibly emulate a historical "internet" in which all domain names resolved to their historical servers/websites, but for that I have no solutions to offer.

    Perhaps i misunderstood the problem?

    Cheers,

    Euan Cochrane

    ReplyDelete
  4. Just to clarify, we do need to preserve domain names and their ownership history in repositories/archives, but I can't understand why people would need to reserve them indefinitely.

    Cheers,

    Euan

    ReplyDelete
  5. A lot of us benefit already from the relative security of .edu domains. While US universities do change their domains, I've never heard of a US education institution losing its domain in way that left it unable to forward or redirect requests. Does ac.uk have very different governance? Is there a real risk of losing dcc.ac.uk to another unfriendly organization?

    Links broken on a finer scale are a pain, but are really a symptom of poor planning and policy, not a fault of web architecture. We're a lot more savvy these days about designing good URIs and maintaining or properly migrating them as our infrastructures change.

    An international domain specifically for libraries is an attractive idea. Could it provide the best of both approaches? Domain and identifier permanence combined with the web's freedom to innovate with objects and identifiers?

    ReplyDelete
  6. @Euan, if you regard content as independent pieces, then the domain names are pretty much irrelevant and preservation can consist of moving stuff into a box somewhere. But if content is "of the web", in the sense that people know where it is via their bookmarks and references, then losing your domain name and re-creating the content elsewhere can make it almost disappear (and lose its search ranking along with all the in-bound pointers).

    As well, we tend to incorporate many domains into our documents by reference even if they are never visible to us. I mean all that stuff in the header about name spaces etc.

    BTW on your second point, Henry Thompson was also concerned about preserving that history of domain name ownership. It doesn't appear as if there is a solution to that, although the various registries might have one in place (not accessibly, though, AFAIK).

    ReplyDelete
  7. @sgillies ac.uk does have a relatively benign registrar, so it is unlikely that dcc.ac.uk would go to a porn site even if we did forget to pay (and there is a risk, as for historical reasons a different division of the university registered and pre-paid the domain name). However, I was really using us as an hypothetical example, rather than picking on some other body.

    I should say I'm a supporter of good URIs (he said nervously, knowing that his site is moving to a new provider shortly, and not knowing quite what will happen). But, to pick another example, you can set up a redirect from rlg.org to oclc.org while you still have control of the rlg.org domain name. So in theory all those old RLG Diginews citations many of us have could simply redirect to the new home (they don't, but that's another story). You can't do it for rlg.ac.uk, as someone chose not to renew that domain, so the bookmarks I have in to their just fail.

    ReplyDelete
  8. Chris,

    In the short term, content disappearing is a difficult issue. But hopefully in time services will exist (like the wayback machine/internet archive but more comprehensive/complex) that will become normal places for people to go when their links are broken. The memento project is doing some interesting work with this (http://mementoweb.org/).
    I hadn't considered the idea of search rankings. How would historical versions of the web deal with access? I guess they could index time/date stamped versions and users would then be able to search versions of the web from particular dates/times. That would be a technical challenge!

    ReplyDelete

Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.