OBO format and OWL are both valid syntaxes for OBO ontologies, and looking forward, it is the intention that these two languages be entirely interconvertible. One key requirement for interconversion is that both formats use the same system for handling unique identifiers, and currently these are handled differently. OBO format uses a string of the form [prefix]:nnnnnnn, while OWL uses URIs[1]. The purpose of this document is to establish a policy for OBO identifiers, a correspondence between identifiers in the two encodings, and to explain the rationale behind the choice that has been made.
The policies that are recommended here are intended to be normative for OBO Foundry ontologies and suggested for OBO Library ontologies. They do not speak to the use of OBO format for other purposes. Feedback on these policies should be sent to one of two mailing lists: obo-discuss or obo-format
This document addresses identifiers only for ontology terms, and not for Dbxrefs. It is the intention that the mapping of identifiers used in Dbxrefs will be be based on the URIs established by the Shared Name Initiative.
Design goals
There must be a predictable, bidirectional mapping between OBO ids, Foundry-compliant URIs, and OBO legacy URIs.
The mappings between OBO format, Foundry-compliant URIs, and OBO legacy URIs are shown in Figure 1.
Example of ID Definitional Expressions: GO:0005737^part_of(CL:0000023) can be used wherever one wants to say "cytoplasm of oocyte". This is treated as if it has the following definition:
[Term]
id: GO:0005737^part_of(CL:0000023)
intersection_of: GO:0005737 ! cytoplasm
intersection_of: part_of CL:0000023 ! oocyte
OBO Foundry ontologies MUST use OBO format identifiers that match the production OBO_IDENTIFIER if they are formatted in OBO format, and the production FOUNDRY_OBO_URI if they are formatted as OWL. Where an ontology is distributed in both formats, identifiers are mapped according to the substitutions defined in the section "Mapping of OWL ids to OBO format ids".
Response to Web requests for OBO URIs
It is expected that the Foundry-compliant URIs behave, on the web, usefully. It will be the role of the OBO Foundry to supply generic software for responding to requests at URIs that identify OBO terms.
We borrow the criteria from the Shared Name Initiative (http://sharedname.org/) as a base line. The OBO Foundry may issue further recommendations if experience shows them to be considered generally useful.
It must be clearly stated what the intended referent of each URI is supposed to be, i.e. that the URI denotes some particular record from some particular database.
Information about the URI and its referent, including such a statement, must be made available, and in order to leverage existing protocol stacks, it must be obtainable via HTTP. (We will call such information "URI documentation".)
URI documentation must be provided in RDF.
Provision of URI documentation must be an ongoing concern. The ability to provide it may have to outlive the original ontology developer's group or creator.
The provider of the URI documentation must be responsive to community needs, such as the need to have mistakes fixed in a timely manner.
URI documentation must be open so that it can be replicated and reused.
Individual ontology projects may, at their discretion, choose to manage these responses, with the understanding that if service lapses the Foundry may substitute the generic software for handling them in order to maintain service.
OBO Library ontologies are not constrained by this policy, however, we recommend that they follow it nonetheless, for three reasons. First, it provides a uniform experience and sets expectations for ontology clients. Second, by doing so library ontologies will be able to take advantage of shared infrastructure. Third, ontologies that eventually join the foundry would have to disrupt their ids if they had to change to follow this policy.
Allocating IDSPACEs
IDSPACEs within the OBO library are unique for a given project and are chosen not to conflict with prefix for xrefs. Although IDSPACEs are case-sensitive, there will never be more than one IDSPACEs that are the same when compared in a case-insensitive manner. Therefore, although "GO" and "go", "Go" and "gO" are different IDSPACEs, the IDSPACE "go", "Go" and "gO" will not be used as "GO" has already been allocated.
A registry of allocated IDSPACEs will be maintained. Requests for an IDSPACE should be made by sending mail to obo-discuss@lists.sourceforge.net, cc obo-admin@fruitfly.org. A request should include information about the ontology, such as scope and maintainer and a confirmation that the ontology is open access.
Resources at Known locations
Registry of IDSPACEs: http://purl.obolibrary.org/obo/idspaces.txt
A tab delimited text file with five columns,
1) the idspace,
2) a string indicating the status of the idspace. The possible values for status are
"OBOFOUNDRY" - The IDSPACE is of an OBO Foundry ontology
"OBOLIBRARY" - The IDSPACE is of an OBO ontology, not currently a Foundry ontology
"RESERVED" - The IDSPACE is used for dbxrefs or is otherwise unsuitable as an idspace for an ontology
3) The name of the point of contact
4) The email address for the point of contact
5) A short description of the scope
Current ontology document:
The most current version of an ontology will be at the following URL, where "IDSPACE" is replaced with the IDSPACE of the given ontology in lower case.
Current OWL: http://purl.obolibrary.org/obo/IDSPACE.owl
For example, IAO distributes its ontology metadata set as a distinct document, with <name> "ontology-metadata"
History and Rationale
The hostname chosen for this by vote of the OBO Foundry editors is purl.obolibrary.org. This dedicated hostname allows redirection at the DNS level, so that we don't require extra time for the resolution or dedicated servers to actually handle lookups.
While the initial preference was towards maintaining IDs as currently used by the community (e.g. GO:nnnnnnn), RDF/XML and N3 (the other RDF syntax) require the character after the colon to be alphabetic. (see QName production in the W3C specification Namespaces in XML and productions [7][8][11][4][6] - the last [6] NCNameStartChar ::= Letter | '_' is responsible for the prohibition against leading digit)
All entities that we define - classes, relations, and instances - are assigned IDs. URIs are opaque, and we use labels for the human readable version. Editing tools can then be configured to display the labels instead of the identifiers.
Note that the OBO legacy URIs will be supported for dereferencing ontologies for some transition period, however applications that depend on referencing OBO ontology terms using the legacy URI will need to migrate to using Foundry-compliant URIs for ontologies that choose to use them.
The OBO legacy URIs are of the form http://purl.org/obo/owl/OBI#OBI_0100051.
Undesirable aspects of the OBO legacy URIs are:
The adopted format, http://purl.obolibrary.org/obo/OBI_0100051, is as short as sensible while avoiding the above issues.
References
[1] http://en.wikipedia.org/wiki/Uniform_Resource_Identifier, a Uniform Resource Identifier (URI) consists of a string of characters used to identify or name a resource on the Internet. Such identification enables interaction with representations of the resource over a network (typically the World Wide Web) using specific protocols.
[2] The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration.
[3] Changing OBI's URIs to be purl based - discussion on obi-developers list
[4] http://purl.org/, A PURL is a Persistent Uniform Resource Locator. Functionally, a PURL is a URL. However, instead of pointing directly to the location of an Internet resource, a PURL points to an intermediate resolution service. The PURL resolution service associates the PURL with the actual URL and returns that URL to the client. The client can then complete the URL transaction in the normal fashion. In Web parlance, this is a standard HTTP redirect.
[5] http://en.wikipedia.org/wiki/Domain_name_system, the Domain Name System (DNS) associates various sorts of information with so-called domain names; most importantly, it serves as the "phone book" for the Internet by translating human-readable computer hostnames, e.g. www.example.com, into the IP addresses, e.g. 208.77.188.166, that networking equipment needs to deliver information.
[6] The OBO Flat File Format Specification, version 1.2 http://www.geneontology.org/GO.format.obo-1_2.shtml
[7] The OBO Flat File Format Specification, version 1.3 draft: http://www.geneontology.org/GO.format.obo-1_3.shtml
[8] OBO-Format and Obolog Specification (1.3) DRAFT http://oboedit.org/obolog/spec/obolog-spec.pdf
Acknowledgments
This policy has been initially discussed, drafted and implemented as part of the development of the OBI project.
Authors: Alan Ruttenberg, Melanie Courtot and Chris Mungall.Thanks to Jonathan Rees, Bill Bug, Colin Batchelor, David Osumi-Sutherland, Duncan Hull, Peter Robinson, Michel Dumontier, the OBO coordinators and the OBI Consortium for their help.