Applying linked open data to public procurement

The following post comprises some of my notes for the talk Applying Linked Open Data to Public Procurement, which I gave at the EU PSI Group meeting on January 24, 2013.

Public procurement is quite a specific domain. It is the domain in which public sector and business interfaces, forming business deals that trade public funding for goods and services. It is a huge intersection that, as of 2010, makes up for 17,3 % of the gross domestic product of the European Union (source [PDF]). For example, it amounts almost to 20 billion EUR in the Czech Republic. If I was to use a technical term, then “profiling the EU” might show public procurement as its hot-spot. Therefore, I think that applying linked open data to public procurement in the EU is optimizing where it matters.

The life-cycle of data in this domain is “solitary, poor, nasty, brutish, and short” (source). The transience is natural to public procurement data. The data comes in streams of public notices, calls for tenders and the like, and loses most of its value once opportunities for businesses are turned into contracts that are awarded and signed. The life-cycle of the data is tightly bound to applications that are used either to produce or consume the data. The data could serve a lot longer than it does, trapped inside applications and restrictive licensing regimes. Even the public procurement data from the past, while losing its perceived value progressively over time, may well serve for analytics.

In this respect linked data helps by decoupling data from applications. It models data in an application-agnostic way so that it can power all kinds of applications. Moreover, semantic web technologies offer means to decouple data from natural languages by describing data in a structured way, so that language understanding is not required for all types of useful data processing. In this way, linked data prolongs the life-cycle of data and makes it work both for the public sector and the public.

Information about public procurement is distributed unevenly. Not all parties interested in public procurement have equal level of access to information describing it. Access to information is vital for the pre-award stage of procurement, which is in fact also called the information stage. However, the distribution of information in this domain is asymmetric, which may be the root cause of inefficiencies the domain is infested with. Linked data, and linked open data in particular, strives for an equal level of access to information, so that everyone starts with the same initial conditions when participating in public procurement.

Reforming the way data in public procurement works is needed to ensure optimal functioning in the domain on a number of levels.

Public procurement is an area well-saturated with money and so it presents numerous opportunities for corruption and affords systemic inefficiencies. Thus, it is crucial to procure public contracts in a transparent manner. In public procurement contracting should be done in public; it should be documented and available to the members of public. Documents should clearly show that decisions about public contracts should be subordinate to public interest, not self-interest. In this way, transparency may yield accountability, as individual decisions can be attributed to decision-makers.

Yet most of the volume of public procurement is not transparent. What is governed by the current rules (e.g., in the European Union) is only a subset of all public procurement. These are the public contracts, which exceed the thresholds for mandatory reporting. What is not included in these forms the long tail of public procurement. For example, it is estimated that 85 % of public contracts in the EU are not announced via the central system TED (source). Even though some of this massive “dark matter” of public procurement is available through local portals, it constitutes information that is difficult to reach.

Transparency and symmetric distribution of information configure public procurement for fairer competition. If all interested parties have equal access to information about public procurement, then their opportunity to participate in procurement is equal, and eventually, more openness leads to higher competition.

Ensuring equal initial conditions is particularly important for increasing the volume of cross-country procurement. Even if companies and individuals frequently buy from suppliers in other countries, such behaviour is still quite rare in public procurement. Having the data on opportunities in public procurement openly available, in a structured format that does not rely on natural language description, could help boost the number of cross-country public contracts.

In the end, what optimizations of the flow of data in public procurement bring is efficiency. These might help the public sector to achieve better resource allocation, exploiting potential for cost savings. In general, quality of public procurement depends on quality of procurement data: better data affords better operation.


I have tried to outline some of the potential benefits of applying linked open data to public procurement. In many cases such application would lower or remove the barriers that are present in the current public procurement systems.
  • Unclear licensing
    What does '© European Union' mean? Discovering correct interpretation of legal conditions governing the use of public procurement data may often require significant investment of time. Especially in the case of national portals, the conditions of use are either vague, missing or burdened with special disclaimers. When foreign public contracts are of concern, then delving into local legislation may prove to be a time-consuming exercise. In any of these cases, explicit and standardized licences recommended for open data would help.
  • Machine readability
    Being hardly accessible for machines, many of the data sources in public procurement seem to have exclusive arrangements with humans. Such datasets are embedded in web pages, out of which the original structured data has to be reconstructed by techniques such as screen-scraping. We have already learnt that screen-scraping incurs a high marginal cost of reproduction. To avoid this unnecessary cost data sources should provide direct access to raw, structured data.
  • Entity reconciliation
    Basically, entity reconciliation is a technical term for merging the same entities together. Unfortunately, in many cases public procurement data sources do not offer enough information to determine if two entities are the same. Identifiers are helpful, yet in some sources they occur fairly rarely, even though reporting them is mandatory. In practice, doing things like deduplication of business entities proves to be difficult without having their rich descriptions available in data.
I think that linked open data can help to remove these road blocks on the way to a single, global market for public procurement. The potential of new technologies and their impact on public procurement is immense. European Commission recognized it in an official decision saying that:
“The new information and communication technologies have created unprecedented possibilities to aggregate and combine content from different sources.”
Combining and merging data is an area where linked data shines the most. Ultimately, if we want to combine markets we must first combine the data they are built on. Let's see if linked data technologies offer a good “glue” to mashup public procurement markets.