2011-09-24

Technical openness of open data

Apart from the legal requirements on open data, there are also aspects of technical openness. While the legal aspects are explicitly defined by the Open Definition, there is less understanding of the technical recommendations for making data open. Some principles of this side of openness are covered by the Three laws of open data by David Eaves, others are proposed in the Linked Open Data star scheme. An excellent resource that touches on both legal and technical requirements for open data is 8 open government data principles.

Data need to be formalized so that we can serialize them to representations that may be exchanged. However, there are different formalizations that may be used for communicating data, different formats that are more or less open. I think open technologies for representing data share a set of family resemblances. So, open data are:

Non-exclusive
Open data are not published exclusively for a particular application. No application has exclusive access to open data. Instead, they are available to be used by any application and thus support a wide range of uses.

Non-proprietary
No entity has exclusive control over non-proprietary data formats. Such formats have an open specification that may be implemented by anyone. Therefore, data in these formats are not tightly coupled with a specific software that is able to read them.

Standards-based
The data are based on open, community-owned standards. This means the standards are developed in an open process that may be joined by anyone from the public (i.e., not Schema.org). Such standards prescribe a set of rules the data have to adhere to. Standardized data have an expected format, which ensures interoperability, and as such can be used by a plethora of standards-compliant tools.

Machine-readable
Open data are formalized enough so that machines are able to use them. Well-formalized data have a structure that enables their automated machine processing. For instance, unlike a scanned document stored as an image, which is one opaque blob, open data have a higher granularity because they are segmented into well-defined data items (e.g., rows, columns, triples).

Findable
Open data should be publicly available on the Web. This means to have URLs that successfully return representation of data. Data should be directly accessible by resolving its URL. Any technical barriers, passwords or required registration, preventing from accessing data are unacceptable, as well as any attemps to hide the data and achieve security through obscurity via techniques of anti-SEO. As David Eaves puts it, if Google cannot find it, no one can.

Linkable
Elements of open data should be identified with URIs. In this way it is possible to link to it. This approach encourages re-use, data integration, and proper attribution of data used as a source.

Linked
If your open data are linked to other open data, users can follow these links to discover more. Being a part of the Web of data brings the benefits yielded by the network effects.

As you might have guessed from the previous points, I think that linked data is a very open technology. And, if you look at the 5 star of linked data, its author Tim Berners-Lee thinks the same. So if you want to make your data more open, it is a step in the right direction.

No comments :

Post a Comment