2012-08-16

Technologies of linked data: HTTP

The following post is an excerpt from my thesis entitled Linked open data for public sector information.
Linked data uses URIs with the http scheme that are handled by the Hypertext Transfer Protocol (HTTP), “an application-level protocol for distributed, collaborative, hypermedia information systems” [1, p. 6]. HTTP is the default interaction protocol for linked data that is used for data exchange, querying, updates, and so forth. Linked data uses HTTP in accordance with the constraints of the Representational State Tranfer architectural style that is described in the next section.

Representational State Transfer

The resource-oriented architecture of linked data may be considered as a style that builds on Representational State Transfer (REST). REST is an architectural style defining a stateless communication protocol for distributed client-server applications, such as the World Wide Web. Roy Fielding, the author of REST, defines architectural style as a “coordinated set of architectural constraints that has been given a name for ease of reference [2, p. xvi]. In his doctoral dissertation Fielding defines four interface constrains for REST:
  • identification of resources with URIs
  • manipulation with resources through their representations
  • self-descriptive messages
  • hypermedia as the engine of application state
Linked data interfaces adopt these constraints and they build onto them further constraints based on the Linked Data Principles.

Dereferencing

Dereferencing is a basic mechanism built on REST that linked data employs for interaction with URIs. By minting a URI in a namespace, namespace owners “enter into an implicit social contract with users of their data” [3] and should be therefore aware that “there are social expectations for responsible representation management by URI owners” [4]. The expectation the users of URIs have is that there are dereferencing mechanisms implemented for the URIs, which work in a predictable manner.
Dereferencing is an idempotent operation on URI that exchanges reference to a resource for the resource. HTTP agent (e.g., a web browser) that dereferences a URI issues an HTTP GET request for the resource’s reference (i.e., a URI) and the HTTP server administering this reference replies with a response containing the resource or its representation. The response should be accompanied by a correct HTTP Content-type header indicating the data format of the response encoded with the Multipurpose Internet Mail Extensions (MIME). Dereferencing can be indirect as redirects may be employed, which is a common practice especially for persistent URIs and non-information resources.
According to the Architecture of the World Wide Web [4] there are two kinds of resources, information resources and non-information resources, for which different dereferencing mechanisms apply. Information resource is “a resource which has the property that all of its essential characteristics can be conveyed in a message” [Ibid.], and so it may be trasferred via HTTP (e.g., HTML or PDF files). For example, http://dbpedia.org/page/Czech_Republic is a URI of an information resource identifying a page about the Czech Republic. Non-information resources are those resources that cannot be transferred via HTTP, such as physical objects or abstract notions. For example, http://dbpedia.org/resource/Czech_Republic is a URI of a non-information resource representing the Czech Republic. Since the owner of a URI of a non-information resource cannot serve the user requesting the URI with the identified resource, a recommended, yet widely disputed practice suggests to reply with the HTTP 303 See Other status code redirecting users to a URI of a representation of the non-information resource [5].

Content negotiation

Content negotiation is a way how to decide on an appropriate response format based on the content of HTTP request’s headers. HTTP clients can send HTTP headers along with the requested URI to provide context, stating what format of representation of the requested resource they prefer.
A common HTTP header that is used for this purpose in the linked data publication model is the Accept header that contains an enumeration of the preferred MIME types for the representation of the requested resource. This pattern allows the client to negotiate with a server on the format of the server’s response that is appropriate for the actual communication context. In practice, this is a way how the server may distinguish between human and machine traffic and serve either a human-readable (e.g., HTML) or a machine-readable (e.g., XML) representation of the requested resource.
Principles of content negotation offer a generic approach to communication of the client’s preferences. A widespread use of content negotiation may be demonstrated on the Accept-Language header that may be used to indicate preferred language of the response. A novel use of this method is the datetime content negotiation that allows the client to access different time snapshots of data using the Accept-Datetime header, which is implemented in the Memento software.
There are multiple ways and levels on which content negotation may be implemented. A common way to do it is by configuring HTTP server, such as with the Apache HTTPD’s mod_rewrite. A recommended way to enable discovery of the supported types of representations is to use the link HTML element with a link typed ”alternate” and the type attribute describing a MIME type that the server is capable of responding with.

References

  1. RFC 2616. Hypertext Transfer Protocol: HTTP/1.1 [online]. FIELDING, Roy Thomas; GETTYS, J.; MOGUL, J.; FRYSTYK, H.; MASINTER, L.; LEACH, P.; BERNERS-LEE, Tim. June 1999 [cit. 2012-04-21], 176 p. Available from WWW: http://tools.ietf.org/html/rfc2616. ISSN 2070-1721.
  2. FIELDING, Roy Thomas. Architectural styles and the design of network-based software architectures. Irvine (CA), 2000. 162 p. Dissertation (PhD.). University of California, Irvine.
  3. HYLAND, Bernardette; TERRAZAS, Boris Villazón; CAPADISLI, Sarven. Cookbook for open government linked data [online]. Last modified on February 20th, 2012 [cit. 2012-04-11]. Available from WWW: http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook
  4. JACOBS, Ian; WALSH, Norman (eds.). Architecture of the World Wide Web, volume 1 [online]. W3C Recommendation. December 15th, 2004 [cit. 2012-04-20]. Available from WWW: http://www.w3.org/TR/webarch/
  5. HEATH, Tom; BIZER, Chris. Linked data: evolving the Web into a global data space. 1st ed. Morgan & Claypool, 2011. Also available from WWW: http://linkeddatabook.com/book. ISBN 978-1-60845-430-3. DOI 10.2200/S00334ED1V01Y201102WBE001.

No comments:

Post a Comment