Subscribe to Blog via Email
Join 296 other subscribers-
Recent Posts
Recent Comments
- Wlodzimierz Kuczynski on Vamvakaris: The flood
- opoudjis on Which Indian states are well known in other countries?
- Test Test on Which Indian states are well known in other countries?
- opoudjis on Karamanlis and their food
- Stazybo Horn on Karamanlis and their food
Archives
- July 2023
- June 2023
- May 2023
- February 2023
- June 2022
- November 2021
- October 2021
- March 2019
- February 2019
- November 2017
- October 2017
- September 2017
- August 2017
- July 2017
- June 2017
- May 2017
- April 2017
- March 2017
- February 2017
- January 2017
- December 2016
- November 2016
- October 2016
- September 2016
- August 2016
- July 2016
- June 2016
- May 2016
- April 2016
- March 2016
- February 2016
- January 2016
- December 2015
- November 2015
- September 2015
- February 2011
- January 2011
- November 2010
- July 2010
- May 2010
- April 2010
- March 2010
- February 2010
- January 2010
- December 2009
- November 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- July 2008
- June 2008
- November 2006
- October 2006
Categories
Meta
Thoughts on permanent identifiers
In re:
http://ptsefton.com/blog/2006/11/01/repository-maintenance
Some random thoughts on permanent identifiers (my day job), triggered from Peter Sefton’s post above.
- The HTTP proxy address to resolve a Handles (or whatever else) permanent identifier for a resource is binding the permanent identifier to a particular protocol (HTTP) and particular host ( hdl.handle.net, arrow.monash.edu.au, whatever). This has the advantage of actually working in the current web infrastructure, which a URI based on Handles (or whatever) does not. This is turn links up with Norman Walsh’s contention that “if I want DNS I know where to find it” — i.e. why come up with and fund a shadow to DNS in Handles (or whatever), when DNS is already working. That’s a question I’m not getting into yet, but it is true enough that HTTP addresses are real, and hdl: URIs (or whatever) are currently not outside a very small number of browsers.
- However, there is nothing permanent about an HTTP link to begin with — that’s the whole point of having a persistent identifier that isn’t a URL. After all, there may not always be an HTTP; and HTTP URLs as is have a half-life of what, six months? As Sefton points out, there may also not always be an arrow.monash.edu.au, so rewriting URLs containing arrow.monash to something else is a big risk. A plus of having a national infrastructure for identifiers would be that, while there may not always be an arrow.monash (or even, heavens forfend, a Monash), there will always be an Australian Government(*), and one can expect the Australian Government to always be able to resolve those identifiers.
* NOTE: by “always”, I mean of course “next few decades”. I’ll save the “I’m laminating my papers and burying them in Spitzbergen” tirade for another time.
- So a couple of things I think should happen (right now, a week into the job, and with no idea of what I’m talking about) are
- While having the Handles-resolving URL at your HTTP proxy (http://hdl.handle.net/<HANDLE> or http://arrow.monash.edu.au/<HANDLE> ) is a good and valid and practical thing, it’s not a persistent identifier itself; just a link to one. Argal, the digital object should include a Handle URI, distinct from the HTTP link, for future-proofing’s sake. Similarly, people should be encouraged to cite the Handle URI, as well as or instead of the URL. After all, HTTP proxies can change (and will, and will be autogenerated from your repository). But the data itself should bear and contain its permanent identifier, which should travel with the digital object to wherever it ends up. To recover the <HANDLE> from the proxy URL requires that I know where the proxy ends and where the handle begins. Since a Handle can contain more than one slash, it ain’t unambiguous: given http://example.com/hdl/77/99 , I cannot know whether the handle is hdl/77/99 (naming authority: hdl) or 77/99 (naming authority: 77). And knowing which Handle proxy servers were around at the time the URL was minted shouldn’t be necessary for me to recover the identifier.
- We may have a national infrastructure for Handles (or whatever), but that need not mean national-level management of the Handles. It would be pointless to make a request to Canberra every time a repository in Australia needs to register a new object — even if the request is instantaneous and light enough not to require human intervention. One of the unsung assets of the Handles system is that individual fields of the Handle record can be managed by different administrators. To me, that means a federated identifier infrastructure; Canberra can override and step in in case of emergency or disaster, but the day-to-day management of identifiers can stay with the repository managers who actually know what’s going on in their repository.
- Accordingly, the national identifer should make migration of permanent identifiers possible: if a naming authority is dissolved, the national-level identifier management should either pass on the naming authority to some other institution, or take over the naming authority itself. If there’s no such guarantee, the identifiers are not permanent. (That is assuming there will always be an Australian government, for which see above.)
- I agree with Peter that the browser should (for RFC 2119 values of “should”) display a Handles-like rather than VITAL-like URL, since the VITAL URL is not even a shadow of a permanent identifier. A common URL format is also a “should”. But without minimising the importance of getting the HTTP links migratable, I still think it’s the Handles URI inclusion that is the “must”.
The “Wherefore Identifiers” post that preceded the above on Pete’s blog is more of a challenge; the Norman Walsh riposte and Pete’s query on full-text local names made me forget who I was and what I was doing here. I’ll come back to it when I have more time and less confusion…
Persistence is a social problem, not a technical one. There is no sequence of characters that is, by its nature, inherently more persistent than an HTTP URI.
The fact that some HTTP URIs go 404 after some period of time, doesn’t mean that they all will or must (well, any sooner than any other system of reference, anyway).
Rather than investing in elaborate redirection schemes that are, at bottom, no more persistent than simple URIs, it probably makes more sense to attempt to get the agencies in question to adopt a reasonable persistence policy, such as the W3C Persistence policy.