2011-07-19

Spoonfeeding Google with RDF graphs packaged as trees

During a small side project I've found out that Google Rich Snippets Testing Tool doesn't treat RDFa as RDF (i.e., a graph) but rather as a simple hierarchical structure (i.e., a tree). It doesn't take under account links in RDFa, but only the way HTML elements are nested inside one another. More about the difference between data models of graph and tree can be found in a blog post by Lin Clark.

I've created two documents that give the same RDF when you run RDFa distiller on them. Both contain GoodRelations product data, but the difference between them is that in the first document the HTML element describing price specification (gr:UnitPriceSpecification) is a not nested inside the HTML element descibing the offering (gr:Offering) and the offering is linked to via gr:hasPriceSpecification property. In the second document the HTML element with price specification is nested in the element about the offering.

Even though the documents contain same data, Google Rich Snippets Testing Tool parses them differently and refuses to show a preview of search result in the case of the first document, whereas the second document produces a preview. In the first case, the price information is not recognized because it's not nested inside the HTML element describing the offering and thus a warning is shown:

Warning: In order to generate a preview, either price or review or availability needs to be present.

This leads me to believe that Google Rich Snippets Testing Tool doesn't parse RDFa as RDF, but as a tree (much like a DOM tree), effectively the same way as HTML5 microdata, which is built on the tree model. Google doesn't use RDFa as RDF, but as microdata.

Eric Hellman wrote a blog post about spoonfeeding data to Google. Even though Google still accepts some RDF (e.g., GoodRelations) after the announcement of microdata-based Schema.org, it wants to be spoonfed RDF graphs packaged as microdata trees. Does it mean that if Google is your primary target consumer for your data, you shouldn't bother with packaging your RDF in trees, but rather directly provide your data as a tree in HTML5 microdata?

2011-07-03

RDFa in action

RDFa is a way how to exchange structured data inside of HTML documents. RDFa provides information that is formalized enough for computers (such as googlebot) to process it in an automated way. RDFa is a complete serialization of RDF, using the attribute = value pairs to embed data into HTML documents in a way that does not affect their visual display. RDFa is a hack built on top of HTML. It repurposes some of the standard HTML attributes (such as href, src or rel) and adds new ones (such as property, about or typeof) to enrich HTML with semantic mark-up.

A good way to start with RDFa is to read through some of the documents, such as the RDFa Primer or even the RDFa specification. When you want to annotate an HTML document with RDFa you might want to go through a series of steps. We have used this workflow during an RDFa workshop I have helped to organize and this recipe worked quite well. Here it is.

  1. Find out what do you want to describe (e.g., your personal profile).
  2. Find which RDF vocabularies can be used to express description of such a thing (e.g., FOAF). There are multiple ways how to discover suitable vocabularies, some of which are listed at the W3C website for Ontology Dowsing.
  3. Start editting your HTML: either the static files or dynamically rendered templates.
  4. Start at the first line of your document and set the correct DOCTYPE. If you are using XHTML, use <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> (i.e., RDFa 1.0). If you are using HTML5, use <!doctype html> (i.e., RDFa 1.1). This will allow you to validate your document and see if you are using RDFa correctly.
  5. Refer to the used RDF vocabularies. By declaring vocabularies' namespaces you can set up variables that you can use in compact URIs. If you are using XHTML, use the xmlns attribute (e.g., xmlns:dv="http://rdf.data-vocabulary.org/#"). If you are using HTML5, use prefix, vocab, or profile attributes (e.g., prefix="dv: http://rdf.data-vocabulary.org/#").
  6. Identify the thing you want to describe. Use a URI as a name for the thing so that others can link to it. Use the about attribute (e.g., <body about="http://example.com/recipe">). Everything that is nested inside of the HTML element with the about attribute is the description of the identified thing, unless a new subject of description is introduced via a new about attribute.
  7. Use the typeof attribute to express what is the thing that you are describing (e.g., <body about="http://example.com/recipe" typeof="dv:Recipe">). Pick a suitable class from the RDF vocabularies you have chosen to use and define the thing you describe as an instance of this class. Note that every time the typeof attribute is used the subject of description changes.
  8. Use the property, rel and rev attributes to name the properties of the thing you are describing (e.g., <h1 property="name">).
  9. Assing values to the properties of the described thing using either the textual content of the annotated HTML element or an attribute such as content, href, resource or src (e.g., <h1 property="name">RDFa in action</h1> or <span property="v:author" rel="dcterms:creator" resource="http://keg.vse.cz/resource/person/jindrich-mynarz">Jind??ich Mynarz</span>).
  10. If you have assigned the textual content of an HTML element as a value of an attribute of the thing described you can annotate it. To define the language of the text, use either xml:lang (in XHTML) or lang (in HTML5) attributes (e.g., <h1 property="name" lang="en">RDFa in action</h1>). If you want to set the datatype of the value, use the datatype attribute (e.g., <span content="2011-07-03" datatype="xsd:date">July 3, 2011</span>)
  11. Check you RDFa-annotated document using validators and examine the data using RDFa distillers to see if you have got it right.
  12. Publish the annotated HTML documents on the Web. Ping the RDFa consumers such as search engines so that they know about your RDFa-annotated web pages.

Art of emptiness

Marshall McLuhan created a distinction between "hot" and "cool" media. I think it is a productive conceptualization of media because it stimulates thinking, even though it suggests thinking in terms of binary opposites.

The longer I enjoy art, I think I tend to prefer "cool art". The following is a comparison of hot and cool styles of art, with a particular focus on music. I hope this will not result in a death from metaphor, but rather in a productive use of it. First, let's start with what McLuhan called the "hot media".

Hot art

Hot art is an art of sensory overload. It provides rich, overwhelming, super-stimuli that lower our ability to parse our sensory input. Hot art needs a space to inhabit; it is an environment-seeking art. Art is always situated in a host environment, in a wider context; and hot art needs space to live in. For instance, for visual arts it is the space of plain, white walls in an art gallery.

Hot art enforces a single interpretation, it is not open for a creative use. It guides a person through a linear, pre-defined experience, without a need for participation. In this way, it achieves a temporary oblivion by the means of hypnosis. The source of super-stimulation occupies our brain, blocks any other input, and forces the person to pay attention only to it.

For the most part, hot art is perceived on the conscious level. Hot art is digitally mastered, manufactured product that is made to achieve the maximum effect possible. The result of such process feels artificial, perfect, and error-free.

A typical example of hot art is pop music. For example, this manifests itself in the "wall of sound" method which uses a plenty of different layers of sound to provide a compelling listening experience.

Cool art

On the other hand, cool art is an art of sensory deprivation. It uses under-stimulation to create emptiness. Cool art creates space, and thus it is an environment-creating art as it puts the person perceiving it in an environment of its own.

Cool art is open and invites a multiplicity of interpretations. It inspires people to undergo a non-linear experience, while requiring a high level of active participation. Participatory art evokes hallucination, which manifests itself as a furious fill-in or completion of sense when all outer sensation is withdrawn (source). Left with minimal sensory input, human mind starts to create its own content. This is a mechanical process, a natural reaction to under-stimulation of the sensory apparatus. Left alone, mind tends to wander, fill in the blanks, and complete the missing parts. Cool art inspires to create by the means of hallucination.

The experience of cool art is mostly an unconscious one. In contrast to hot art, it is based in analogue, non-discrete forms, which grow in organic ways. For instance, this can be achieved by the techniques of field recordings or employing non-deterministic or random processes. Such art is in a way more natural, it embraces error (cf. esthetics of glitch in music).

A typical example of cool art is dub techno. Dub techno got rid of the usual elements of music, such as the melody, and confined itself to conveying music mostly through subtle, slowly evolving changes of rhythm or timbre. This is the minimalism that manifests itself through extensive repetition and limiting yourself to the expressive power of bare rhythm.

I prefer cool art to hot art. However, this is a matter of taste, which implies it may change. To conclude, let me give you a couple of examples of what I consider to be cool art.

Visual arts: Unloud painting

Cinema: Stalker by Andrei Tarkovsky

Music: cv313 - Subtraktive (Soultek's Stripped Down Dub)