blog.mynarz.net: ecommerce

2011-04-10

Data-driven e-commerce with GoodRelations

On April 6^that the University of Economics, Prague, Martin Hepp gave a talk entitled Advertising with Linked Data in Web Content: From Semantic SEO to E-Commerce on the Web. Martin presented his view of the current situation in e-commerce and how it can be made better through structured data, explaining it on the use of GoodRelations, the ontology he has created.

GoodRelations

GoodRelations is an ontology describing the domain of electronic commerce. For instance, it can be used to express an offering of a product, specify a price, or describe a business and the like. The author and active maintainer of GoodRelations is Martin Hepp. As he has shared in his talk, there is actually quite a lot of features that set it apart from other ontologies.

It's the single one ontology that someone has paid for doing. At Overstock.com an expert was hired to consult the use of GoodRelations.
It's not only a research project. It's been accepted by the e-commerce industry and it's used by companies such as BestBuy or O'Reilly Media.
Its design is driven mainly by practice and real use cases and not only by research objectives. For instance, it's been amended when Google requested minor changes. And Google even stopped recommending its own vocabulary it has created for the domain of e-commerce in favour of GoodRelations. It's the piece of the semantic web Google has chosen. Nonetheless, it's still an OWL-compliant ontology.
It comes with a healthy ecosystem around it. The ontology provides a thorough documentation with lots of examples and recipes that you can adopt and fine-tune to your specific use case. There are available validators for the ontology and there is a plenty of e-shop extensions and tools built for GoodRelations.
Finally, it's not only a product of necessity. As Martin Hepp said, he actually quite enjoys doing it.

Product Ontology

The other project that was showcased by Martin Hepp is the Product Ontology. It's a dataset describing products that is derived from Wikipedia's pages. It contains a several hundred thousand precise OWL DL class definitions of products. These class definitions are tightly coupled with Wikipedia: the edits in Wikipedia are reflected in Product Ontology. For instance, if the Product Ontology doesn't list the type of product you sell, you can create a page for it in Wikipedia and, given that it's not deleted, the product type will appear within 24 hours in the Product Ontology. This is similar to the way BBC uses Wikipedia. An added benefit is that it can also serve as dictionary containing up to a hundred labels in different languages for a product because it's built on Wikipedia containing the bundles of pages describing the same thing in different languages.

Semantic SEO

The primary benefit of GoodRelations is in how it improves search. We spend more time searching than we have ever used to. Martin Hepp said that there's an order of magnitude increase in the time we spend searching. It takes us long time before we finally find the thing we interested in because the current web search is a blunt instrument.
World-Wide Web acts as a giant information shredder. In databases, data are stored in a structured format. However, during the data transmission to web clients, data are being lost. They aren't sent as structured data but presented in a web page that can be read by a human customer but machines can pretty much treat it only as a black-box. Instead of being sent in the form in which it's stored in database, the message is not kept intact when it's being sent through the web infrastructure. The structure of the data gets lost on the way to a client and only the presentation of the content is delivered. This means that the agent accessing the data via the Web often needs to reconstruct and infer the original structure of the data.
The web search operates on a vast amount of data that is most for part unstructured and as such it doesn't provide the affordances to conduct anything clever. Simple HTML doesn't allow you to articulate your value proposition well. The products and services are often reduced to a price tag. Enter the semantic SEO.
Semantic SEO can be defined as using data to articulate your value proposition on the Web. It strives to preserve the specificity and richness of your value proposition when you need to send it over the Web. Ontologies such as GoodRelations allow you to describe your products and services with a high degree of precision.

Specificity

We need clever and more powerful search engines because of the tremendous growth in specificity. Wealth fosters the differentiation of products and this in turn leads to an increased specificity. This means there is a plethora of various types of goods and services available on the shelves of markets and shops. The size of the type system we use has grown (In RDF-speak, this would be the number of different rdf:types). We're overloaded with the number of different product types we're able to choose from. It's the paradox of choice: faced with a larger number of goods our ability to choose one of them goes down.
What GoodRelations does is that it provides a way to annotate products and services on the Web in a way that can be used by search engines to deliver a better search experience to their users. It allows for the deep search — a search that accepts very specific search queries and gives very precise answers. With GoodRelations you can retain the specificity of your offering and harness it in search. This is a possibility to target niche markets and get customers with highly specific needs in the long tail.
We need better search engines built on the structured data on the Web to alleviate the analysis paralysis that results from us being overwhelmed by the number of things to choose from. The growing amount of GoodRelations-annotated data is a step in the direction to a situation when you'll be able to pose a specific question to a search engine and get a list of only the highly relevant results.
The e-commerce applications and ontologies such as GoodRelations or Product Ontology show the pragmatic approach to the use of the semantic web technologies. Martin Hepp also mentioned his pragmatic view of linked data. In his opinion, the links that create the most business advantage are the most important. And it was interesting to see parts of the semantic web that work. It seems we're headed to a future of data-driven e-commerce.

2011-01-16

Shopping starts at Google

I don't know where the Web ends. It may have multiple ends, or none. But I know where the Web starts. It starts at Google.

Few years back, it was reported that 6 % of all internet traffic starts at Google. Also, plenty of people have Google set as their homepage. I think many of us would agree that our brain is only a thin layer on top of Google.

One reason for using Google is that people don't remember URIs. Google does it well. On the Web the address of a thing is a URI. In human brain the address of a thing is a set of associations which locate it in a neural network. That's why we need a way to translate these associations to a URI. Google does it fairly well. You pass it a bunch of keywords related to the thing you are looking for and it produces a nice, ordered list of URIs that might point to the thing you have on mind.

People don't use URIs to describe the things they are thinking of, machines do. I can't remember URIs, especially those of RDF vocabularies, which tend to be quite long. That's why I use prefix.cc which lets me to find the URI I'm looking for by passing it something I can remember: the vocabulary's prefix. The service remembers the vocabulary's URIs for me.

As it turns out, people don't remember the URIs of the things they want to buy either. So these days, a lot of shopping starts at Google. When you are looking to buy something you often start by describing that something to Google.

In commerce, things are addressed by brand. The problem with that is that people don't search for brands and they don't search for product names; they search for concepts. People don't search for Olympus E-450, they search for a camera. Brands and product names are not in their vocabularies, but concepts described by keywords are. People don't use brand names to describe the things they are thinking of, commerce does.

To bridge this gap you need to translate the keywords that people use to describe stuff to the brands that commerce uses to describe stuff. Enter search engine optimization (SEO). One of the things that SEO does is that it creates synonym rings. Synonym ring is a set of synonyms, words that people use to describe a thing, such as words mentioned in this tweet:

Can you all please stop retweeting those SEO jokes, gags, cracks, funnies, LOLs, humour, ROFLs, chuckles, rib-ticklers, one-liners, puns?

This SEO task consists in collecting the keywords people might use when searching for a thing so that they find your thing™ that you have described with these keywords.

It would be better if you can say that your thing™ (e.g., Olympus E-450) is a kind of thing people search for (e.g., a camera). Then, when people would search for a thing, they may find that your thing™ is such a thing. This is one of the promises of the semantic web vision. But, just as its Wikipedia article, the semantic web still has a lot of issues.

Nevertheless, the semantic web vision created some interesting by-products in the last few years. One of them is the Linked Open Data initiative striving to build a common, open data infrastructure for the semantic web that is coming (for sure). Other by-product of this vision is the so-called semantic SEO.

Both the semantic web and semantic SEO are misnomers. There is nothing exceptionally semantic in them. I would rather like to call it data SEO, but it seems the current name will stick. Semantic SEO is a practise of adding a little bit of structured data (preferably in RDF) to websites instead of adding a bunch of keywords. For instance, you can use the GoodRelations RDF vocabulary to mark-up your web page describing the product you're offering; even Google says you can. In semantic SEO a little bit of semantics is good enough, it can still go a long way.

Having your thing™ described with structured data makes it machine readable. Search engine, like Google, is a kind of machine. Therefore making your data machine-readable makes them readable for search engines. You can try how Google reads your data yourself.

By adding a bit of data into the mark-up of your web page (preferably via RDFa) you can optimize the way it will be displayed in Google's search results. Instead of a boring, text-only rendering you can get a display that contains useful information, such as an image of your thing™, its rating, reviews and the like. See the example at the GoodRelations website to compare the difference.

People are more likely to click on a search result with nice image in it, a result that is enriched with all kinds of useful information. This may lead to an increase in your click-through rate. For example, RDFa adoption at BestBuy resulted in a 30 % increase in search traffic. Pursuing the semantic web vision has been a largely academic undertaking, so it's good to see that its by-product, semantic SEO, has some real financial benefits.

The practise of semantic SEO is definitely not an academic endeavour, quite the opposite, a lot of high-profile companies and institutions are adopting it (e.g., BestBuy, O'Reilly, or Tesco). The share of webpages that have structured data in RDFa in them is growing. In October 2010, RDFa was in 3,5 % webpages, whereas the year before the share was 0,5 %.

E-commerce is one of the key factors that contributed to the growth of the Web in the 1990s. The same may become true for the Web of Data, a.k.a. linked data, and the e-commerce applications of the semantic web technologies, such as semantic SEO, may become a crucial drive behind its growth and lead to accelerate the rate of adoption of the linked data principles.