The following post is an excerpt from my thesis entitled Linked open data for public sector information.
By default, reuse requires permission. Unless there are legal instruments that enforce openness of data by default, there is a need for an explicit, open licence. Licence serves as a legal tool facilitating reuse [1, p. 6].
The licence should state clearly what are the users allowed to do with the data. At the same time, the data should explicitly reference its licence to provide legal certainty. With explicit licences users of data no longer find themselves in a legal vacuum with no clear guidaince on how they can use the data.
However, even though explicit licensing is a fundamental requirement for publishing of both open and non-open data, data producers often neglect to conform with it. For example, 82.16 % of data sources in the Linked Open Data Cloud, the diagram overviewing linked open data sources, do not provide any licensing information [2]. Similar situation may be observed for the Czech public sector data, for which the licence is left unspecified in the majority of cases.
An essential goal of open licences is to achieve equal opportunities to access and use of the licensed work. An open licence should thus be non-exclusive, non-dicriminatory, enabling free reuse and redistribution of the licenced data. It should be agnostic of both users and types of use. Therefore, it should not discriminate against any persons or groups, fields of endeavour, or any types of prospective use for the data. Open licences should permit any type of reuse, allowing modifications and creation of derivative data, and any type of redistribution that provides access to data to others.
Access to data must not be restricted by administrative barriers or geography. Limiting access rigths only to citizens of a particular country is unacceptable. On the contrary, enabling access only to a pre-defined group of people is not sufficient. For example, Creative Commons Developing Nations License makes licensed content open only to the citizens in developing countries and as such is not considered to be an open licence.
Even though the primary objective of open licences is to remove obstacles to access and use, licences may stipulate some permissible requirements that the licensees using the licensed content need to comply with. At maximum, an open licence may require attribution to the original author and redistribution with the same or analogous licence.
However, the requirement for attribution can cause difficulties when multiple datasets are reused and combined. This problem is known as “attribution stacking” because the number of parties that have to be attributed increases with the number of datasets that are involved in reuse and come from different authors.
A similar problem to the attribution stacking and spreading arise with share-alike licences that require the same or analogous licence to be used for redistribution. Share-alike licences are “viral” licences, for which the licensed content is their carrier. They may prove to be difficult to work with in cases where data available under the terms of different viral licences are combined and redistributed.
Open data is advised to be equipped with a standard, generic licence. If a custom licence is applied, it makes the use of data more cumbersome, because the user has to first study the unknown licence, instead of relying on terms and conditions of a well-known licence. Thus, the use of a custom licence may imply high transaction costs associated with using the licensed content.
The way users interface with data may be made even more uniform if a single licence is applied. In a controlled setting, such as in the public sector, establishing a unified licence is encouraged to simplify conditions of use, particularly for combining multiple datasets. Nevertheless, data provision under the terms of one licence is unlikely to scale. There are far too many different conditions around data which no single licence can cover.
Open data licences are considered to be those that conform with the Open Definition. Open Definition is a widely established definition of what it means for information to be open. “A piece of content or data is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and share-alike” [3]. The definition focuses on the legal aspects of openness and as such it is closely tied to licences that enable open distribution.
Several existing licences conform with the requirements on legal openness of open data. Some of them are the generic licences that may be used regardless of the context.
For example, among the generic licences recommended for open data the commonly applied ones include Creative Commons Zero (CC0) and Open Data Commons Public Domain Dedication and License (ODC PDDL). As a matter of fact, CC0 is not a licence, but a waiver that puts the licensed content in the public domain. As discussed in the previous parts, in some states legislation does not allow content to enter in the public domain by artificial means, such as with a waiver. In such cases, ODC PDDL may be applied because it contains not only a waiver but a licence agreement too, which sets the conditions of use for the licensed content to be the same as for the public domain content.
General-purpose licences may be substituted by licences with a specific purpose. An example of this type of licence is UK Open Government Licence that was designed for releasing open data in the UK public sector in particular.


