blog.mynarz.net: April 2011

2011-04-25

Library SEO

When people search for information, they are very likely to start at Google. They don't start at a library like they used to. What this means for libraries is that, if they don't want to be bypassed, it's important that there is a path from Google to a library. If you can't make people start searching at your library, at least you can make the path from Google to your library as prominent and as accessible as possible. The paths that Google serves in nice, ordered lists, are URIs. On the Web, the path to a library is a link.

This is why it's important to make the things at libraries linkable. Then, not only Google can link to it, but everyone can. If people start to link to your library's website, Google perceives this as a good thing. The number of inbound links to a web page is known to be a key factor tied to its importance in the eyes of Google.

In the current situation things in libraries often don't have URIs. Or they have them, but poor ones, unstable, session-based URIs that change with every request. You can't link to such content. It's like a library that prevents you from telling anyone about it. Be aware that Google is a very important user of your library. If it likes your web content, it will tell lots of other people about you. Word of mouth is powerful but word of Google is more powerful.

Steps for library SEO

Provide every piece of your content with a URI.

If your content doesn't have a unique URI, it cannot be linked and it won't show in Google's search results. On the Web, content without a URI does not exist.
Make that URI a stable URI.

Not that dynamic session-based nonsense. Make an effort to sustain the URI in the long term. A URI should always resolve. Provide re-directs if you change your URI's structure (e.g., by using PURL).
Make that URI easy to use.

A short URI that fits in a tweet or can be read from slides is better than a long or unreadable one. If it's easy to use, it'll be used more.
Make that URI a cool one.

Implement content negotiation. In this way, both humans and machines get what they like.

2011-04-23

Removable Web

First, there was the Readable Web. It started as a one-to-many conversation accessible to anyone who had internet connection. Masses of internet users were allowed to read what the elite ones put on-line.

But for many people, the permission to read wasn't enough. The result of that was the Writeable Web. This was a many-to-many conversation and the Web has started to fill with user-generated content. Everyone with internet connection was able to both read and write to the Web.

Increasingly, the Web was not only of documents but it was also a web of applications. Next up was the Executable Web. The web applications exposed standard interfaces — APIs. The new paradigms of software as a service and the Web as a platform have started to get attention. The Web was available for anyone with internet connection to read, write and execute.

Now that we've got all of these permissions to the Web we are able to do lots of powerful things, we can even do damage with them. However, it seems we still lack one permission — the permission to delete.

On the Web everything is recorded and stored (forever). To forget is human, to remember is Google. Every time we use the Web, we leave digital trails. Our foot-print gets stored. For instance, Google stores all queries entered into its search box, even though they get anonymized after 9 months. Another example, Facebook doesn't allow you to delete your account permanently, you have only the permission to de-activate it.

Digital information is not prone to disappear. And we have the methods of digital preservation, such as LOCKSS, to fight the processes of natural degradation of digital information, such as bit-rot (or link-rot), to make it even harder for it to get lost. And even though the bits vanish and links break, these are the natural phenomena of the Web, not something one can control and use on purpose.

The next step in endowing users with more permissions might be the Removable Web, where everyone is able to delete the content their own from the Web. Now, there are some people that enjoy the authority to delete the content from others from the Web. We call it filtering or censorship. There are even some people that can at least temporalily remove the whole Web, as you can see in the recent example, when the internet was switched off in Egypt.

We could benefit from the ability to remove our content. We could get rid of all the embarrassing photos and statuses we have ever posted. Forgetting is an essential human virtue that enables us learn from our mistakes, get a second chance, and re-establish your reputation. Forgetting also helps us to forgive, wrote Viktor Mayer-Schönberger, the author of the book Delete: the virtue of forgetting in the digital age.

With great power comes great responsibility, said Voltaire (and also Spider-Man). Therefore, we should be more conscious of what we write to the Web knowing that it will stay there. The following is often true on the Web: What was published cannot be unpublished. The Web doesn't forget.

We should have a right to delete our own content. Or, as Mayer-Schönberger suggests, we could add expiration dates for digital information. This shouldn't be that hard. We already have a verb for it. Just as we can HTTP GET something to read, HTTP POST to share something, we should be able to HTTP DELETE what we've published.

2011-04-10

Data-driven e-commerce with GoodRelations

On April 6^that the University of Economics, Prague, Martin Hepp gave a talk entitled Advertising with Linked Data in Web Content: From Semantic SEO to E-Commerce on the Web. Martin presented his view of the current situation in e-commerce and how it can be made better through structured data, explaining it on the use of GoodRelations, the ontology he has created.

GoodRelations

GoodRelations is an ontology describing the domain of electronic commerce. For instance, it can be used to express an offering of a product, specify a price, or describe a business and the like. The author and active maintainer of GoodRelations is Martin Hepp. As he has shared in his talk, there is actually quite a lot of features that set it apart from other ontologies.

It's the single one ontology that someone has paid for doing. At Overstock.com an expert was hired to consult the use of GoodRelations.
It's not only a research project. It's been accepted by the e-commerce industry and it's used by companies such as BestBuy or O'Reilly Media.
Its design is driven mainly by practice and real use cases and not only by research objectives. For instance, it's been amended when Google requested minor changes. And Google even stopped recommending its own vocabulary it has created for the domain of e-commerce in favour of GoodRelations. It's the piece of the semantic web Google has chosen. Nonetheless, it's still an OWL-compliant ontology.
It comes with a healthy ecosystem around it. The ontology provides a thorough documentation with lots of examples and recipes that you can adopt and fine-tune to your specific use case. There are available validators for the ontology and there is a plenty of e-shop extensions and tools built for GoodRelations.
Finally, it's not only a product of necessity. As Martin Hepp said, he actually quite enjoys doing it.

Product Ontology

The other project that was showcased by Martin Hepp is the Product Ontology. It's a dataset describing products that is derived from Wikipedia's pages. It contains a several hundred thousand precise OWL DL class definitions of products. These class definitions are tightly coupled with Wikipedia: the edits in Wikipedia are reflected in Product Ontology. For instance, if the Product Ontology doesn't list the type of product you sell, you can create a page for it in Wikipedia and, given that it's not deleted, the product type will appear within 24 hours in the Product Ontology. This is similar to the way BBC uses Wikipedia. An added benefit is that it can also serve as dictionary containing up to a hundred labels in different languages for a product because it's built on Wikipedia containing the bundles of pages describing the same thing in different languages.

Semantic SEO

The primary benefit of GoodRelations is in how it improves search. We spend more time searching than we have ever used to. Martin Hepp said that there's an order of magnitude increase in the time we spend searching. It takes us long time before we finally find the thing we interested in because the current web search is a blunt instrument.
World-Wide Web acts as a giant information shredder. In databases, data are stored in a structured format. However, during the data transmission to web clients, data are being lost. They aren't sent as structured data but presented in a web page that can be read by a human customer but machines can pretty much treat it only as a black-box. Instead of being sent in the form in which it's stored in database, the message is not kept intact when it's being sent through the web infrastructure. The structure of the data gets lost on the way to a client and only the presentation of the content is delivered. This means that the agent accessing the data via the Web often needs to reconstruct and infer the original structure of the data.
The web search operates on a vast amount of data that is most for part unstructured and as such it doesn't provide the affordances to conduct anything clever. Simple HTML doesn't allow you to articulate your value proposition well. The products and services are often reduced to a price tag. Enter the semantic SEO.
Semantic SEO can be defined as using data to articulate your value proposition on the Web. It strives to preserve the specificity and richness of your value proposition when you need to send it over the Web. Ontologies such as GoodRelations allow you to describe your products and services with a high degree of precision.

Specificity

We need clever and more powerful search engines because of the tremendous growth in specificity. Wealth fosters the differentiation of products and this in turn leads to an increased specificity. This means there is a plethora of various types of goods and services available on the shelves of markets and shops. The size of the type system we use has grown (In RDF-speak, this would be the number of different rdf:types). We're overloaded with the number of different product types we're able to choose from. It's the paradox of choice: faced with a larger number of goods our ability to choose one of them goes down.
What GoodRelations does is that it provides a way to annotate products and services on the Web in a way that can be used by search engines to deliver a better search experience to their users. It allows for the deep search — a search that accepts very specific search queries and gives very precise answers. With GoodRelations you can retain the specificity of your offering and harness it in search. This is a possibility to target niche markets and get customers with highly specific needs in the long tail.
We need better search engines built on the structured data on the Web to alleviate the analysis paralysis that results from us being overwhelmed by the number of things to choose from. The growing amount of GoodRelations-annotated data is a step in the direction to a situation when you'll be able to pose a specific question to a search engine and get a list of only the highly relevant results.
The e-commerce applications and ontologies such as GoodRelations or Product Ontology show the pragmatic approach to the use of the semantic web technologies. Martin Hepp also mentioned his pragmatic view of linked data. In his opinion, the links that create the most business advantage are the most important. And it was interesting to see parts of the semantic web that work. It seems we're headed to a future of data-driven e-commerce.