Showing posts with label publicsectorinformation. Show all posts
Showing posts with label publicsectorinformation. Show all posts

2012-10-08

How the Big Clean addresses the challenges of open data

The Big Clean 2012 is a one-day conference dedicated to three principal themes: screen-scraping, data refining and data-driven journalism. These topics address some of the current challenges of open data, focusing on usability, misinterpretation of data and on the issue of making data-driven journalism work.

Usability

A key challenge of the Big Clean is refining raw data into usable data. People often fall victim to the fallacy of treating screen-scraped data as a resource that can be used directly, fed straight into visualizations or analysed to yield insights. However, validity of data must not be taken for granted. It needs to be questioned.
Just as some raw ingredients need to be cooked to become edible, raw data needs to be preprocessed to become usable. Patchy data extracted from web pages should be refined into data that can be relied upon. Cleaning data makes it more regular, error-free and ultimately more usable.
The Big Clean will take this challenge into account in several talks. Jiří Skuhrovec will try to strike a fine balance, considering the question of how much do we need to clean. Štefan Urbánek will walk the event's participants through a data processing pipeline. Apart from the invited talks, this topic will be a subject to a screen-scraping workshop lead by Thomas Levine. The workshop will run in parallel with the main track of the conference.

Misinterpretation

Access to raw data allows people take control of the interpretation of data. Effectively, people are not only taking hold of uninterpreted data, but also of the right to interpret it. This is not the case in the current state of affairs, where there is often no access to raw data, since all data is mediated through user interfaces. In such case, the interface owners control the ways in which data may be viewed. On the contrary, raw data gives you a freedom to interpret data on your own. It allows you to skip the intermediaries and access data directly, instead of limiting yourself to the views provided by the interface owners.
While the loss of control over presentation of data may be perceived as a loss of control over the meaning of the data, it is actually a call for more explicit semantics in the data. It is a call for an encoding of the meaning in data in a waythat does not rely on the presentation of data.
A common excuse for not releasing data held in the public sector is the assumption that the data will be misinterpreted. As reported in Andrew Stott's OKCon 2011 talk, among the civil servants, there is a widespread expectation that “people will draw superficial conclusions from the data without understanding the wider picture.”. First, there is not a single correct interpretation of data possessed by the public sector. Instead, there are multiple valid interpretations that may coexist together. Second, the fact that data is prone to incorrect interpretation may not attest to the ambiguity of the data, but to the ambiguity of its representation.
Tighter semantics may make the danger of misinterpretation less probable. As examples such as Data.gov.uk in the United Kingdom have shown, one way to encode clearer interpretation rules directly into the data is by using semantic web technologies.

Data-driven journalism

Nevertheless, in most cases public sector data is not self-describing. The data is not smart and thus people interpreting it need to be smart. A key group that needs to become smarter, reading the clues conveyed in data, comprises of journalists. Journalists should read data, not only press releases. In becoming data literati the importance of their work increases. They serve as translators, mediating understanding derived from data to the wider public. In this way, data-driven journalism contributes to the goal of making data more usable as stories told with data are more accessible than the data itself.
Raw data opens space for different and potentially competing interpretations. This is the democratic aspect of open data. It invites participation in a shared discourse constructed around the data. A fundamental element of such discourse are the media. Journalists using the data may contribute to this conversation by finding what is new in the data, discovering issues hidden from public oversight or tracing the underlying systemic trends. This is the key contribution of data-driven journalism, providing diagnoses of the present society.
The principal part of data-driven journalism in the open data ecosystem will be reflected in a couple of talks given at the Big Clean. Liliana Bounegru will explain why data journalism is something you too should care about and Caelainn Barr will showcase how the EU data can be used in journalism.

Practical details

The Big Clean will be held on November 3rd, 2012, at the National Technical Library in Prague, Czech Republic. You can register by following this link. The admission to the event is free.
I hope to see many of you there.

2012-08-24

Linked open data in the public sector

The following post is an excerpt from my thesis entitled Linked open data for public sector information.
Having reviewed the theoretical foundations for technical openness and data quality of linked data, this section turns to the ways in which linked open data is used in practice in the public sector. Contrary to the popular belief, linked open data is not any more confined to the research institutes producing pilots and prototypes. It is used in practice, and the public sector is one of the central areas in which linked data is being adopted.
To find out about the role of public sector data in the ever-increasing web of data, the Linked Open Data Cloud diagram may be consulted. This diagram depicts the connections between the existing linked data sources that are published under the terms of an open licence. Progressive changes made to this diagram over time illustrate the growth of the web of data that now contains more than a billion triples. The cloud is partitioned in broad subject categories that include a category for “government”. According to the State of the LOD Cloud [1] survey from September 2011 the datasets in this category represented 42.09 % of triples in the cloud. However, these datasets accounted only for 3.84 % of outbound links to external datasets.
The Linked Open Data Cloud features datasets from the public sector of a number of countries. The U.S. is represented by their pioneering Data.gov project started by the Obama administration in May 2009. In the United Kingdom, the adoption of linked open data in the public sector was kick-started by research projects, such as AKTivePSI [2]  at the University of Southampton. The research activity quickly developed into an official part of work of the public sector and gave rise to Data.gov.uk, one of the most comprehensive and progressive government data catalogues to-date. Aside from the other countries, initial experiments with linked open data for the data produced in the public sector are also conducted in the Czech Republic by an un-official initiative OpenData.cz.
The thriving growth of linked open data activities in the public sector pointed to a need for coordination and development of standards and best practices. The W3C has taken the lead and established the Government Linked Data Working Group to help guide the adoption of linked open data in the public sector. The group is scheduled to run until 2013, but it already published several recommendations, such as the Cookbook for open government linked data [3].

References

  1. BIZER, Chris; JENTZSCH, Anja; CYGANIAK, Richard. State of the LOD Cloud [online]. Version 0.3. September 19th, 2011 [cit. 2012-04-11]. Available from WWW: http://www4.wiwiss.fu-berlin.de/lodcloud/state/
  2. ALANI, Harith; CHANDLER, Peter; HALL, Wendy; O’HARA, Kieron; SHADBOLT, Nigel; SZOMSZOR, Martin. Building a pragmatic semantic web. IEEE Intelligent Systems. May—June 2008, vol. 23, iss. 3, p. 61 — 68. Also available from WWW: http://eprints.soton.ac.uk/265787/1/alani-IEEEIS08.pdf. ISSN 1541-1672. DOI 10.1109/MIS.2008.42.
  3. HYLAND, Bernardette; TERRAZAS, Boris Villazón; CAPADISLI, Sarven. Cookbook for open government linked data [online]. Last modified on February 20th, 2012 [cit. 2012-04-11]. Available from WWW: http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook

2012-08-12

Open data infrastructure of the public sector

The following post is an excerpt from my thesis entitled Linked open data for public sector information.
Information infrastructure is a necessary prerequisite for all information-demanding services. In his treatment on networks Yochai Benkler describes the need for a shared infrastructure.
“To flourish, a networked information economy rich in social production practices requires a core common infrastructure, a set of resources necessary for information production and exchange that are open for all to use. This requires physical, logical, and content resources from which to make new statements, encode them for communication, and then render and receive them” [1, p. 470].
Ursula Maier-Rabler ties these insights to the public sector. “The prerequisite for the functioning of networks is a common infrastructure. The role of government is to provide that infrastructure” [2, p. 187].
In the current state of affairs, there are multiple fragmented infrastructures that the performance of public functions depends on. Moreover, it is common that these infrastructures are available to dedicated applications only, while being closed to applications from other parts of the public sector, let alone the ones created by members of the public. These information infrastructures are neither shared nor open.
Open data may serve as a data infrastructure of the public sector. By definition, it constitutes a fundamentally open and shared infrastructure, that is in line with the Benkler’s vision. Such infrastructure not only enables public services to run; but, because it is open to everyone, it also enables private services to run. Building such infrastructure is the goal of open data initiatives and policies.

References

  1. BENKLER, Yochai. The wealth of networks: how social production transforms markets and freedom. New York: Yale University Press, 2006. ISBN 978-0-300-11056-2.
  2. MAIER-RABLER, Ursula; HUBER, Stefan. “Open”: the changing relation between citizens, public administration, and political authority. eJournal of eDemocracy and Open Government [online]. 2011 [cit. 2012-03-15], vol. 3, no. 2, p. 182 — 191. ISSN 2075-9517. Available from WWW: http://www.jedem.org/article/view/66

2012-08-11

Open data for public sector information

The following post is an excerpt from my thesis entitled Linked open data for public sector information.
Like data in general, public sector information seems to be predisposed to be opened. The key argument in favour for opening up public sector information is that this information belongs to the public. Joseph Stiglitz, a noted economist, writes: “[...] Who owns the information? Is it the private province of the government official, or does it belong to the public at large? I would argue that information gathered by public officials at public expense is owned by the public – just as the chairs and buildings and other physical assets used by government belong to the public” [1, p. 7]. Collection and maintenance of public sector data is paid for from public funds derived from tax incomes. Therefore, the data should be treated as a public good, which enables equal levels of access and use not only to the public sector officials, but to every citizen as well. In other words, paraphrasing an Internet meme, “All your data are belong to us” [2, p. 241].
The public owns the public sector data and demands it to be openly available [3]. In 2010, survey by Socrata showed that there was a strong support for open data in the public sector [4]. It showed that 92.6 % of civil servant would commit to open data and that 67.2 % of citizens agreed with opening up of public sector data. The interest of citizens in data from the public sector may also be illustrated by the existence of community alternatives to public sector data [5]. For example, the demand for geo-spatial data may demonstrated by the projects like OpenStreetMap, for which volunteers are “re-engineering” the data that should have been provided by the public sector.
Given the predispositions of public sector information to being opened, the demand for it, and the technologies that make it possible to be opened, one may expect an increase in activity in this domain. Open data in the public sector went from being a niche cause to being pervasive in the whole world. Now, there is over a hundred initiatives opening up data in the public sector world-wide [6], building up to a global, networked data infrastructure.

References

  1. STIGLITZ, Joseph E. On liberty, the right to know, and public discourse: the role of transparency in public life. Oxford Amnesty Lecture. Oxford (UK), 1999. Also available from WWW:
  2. LATHROP, Daniel; RUMA, Laurel (eds.). Open government: collaboration, transparency, and participation in practice. Sebastopol: O'Reilly, 2010. ISBN 978-0-596-80435-0.
  3. ARTHUR, Charles; CROSS, Michael. Give us back our crown jewels. Guardian [online]. March 9th, 2006 [cit. 2012-03-09]. Available from WWW: http://www.guardian.co.uk/technology/2006/mar/09/education.epublic
  4. Socrata. 2010 open government data benchmark study [online]. Version 1.4. Last updated January 4th, 2011 [cit. 2012-04-07]. Available from WWW:
  5. FIORETTI, Marco. Open data, open society: a research project about openness of public data in EU local administration [online]. Pisa, 2010 [cit. 2012-03-10]. Available from WWW: http://stop.zona-m.net/2011/01/the-open-data-open-society-report-2/
  6. DAVIES, Tim; BAWA, Zainab Ashraf. The promises and perils of open government data (OGD). Journal of Community Informatics [online]. 2012 [cit. 2012-04-12], vol. 8, no. 2. Available from WWW: http://ci-journal.net/index.php/ciej/article/view/929/926. ISSN 1712-4441.

2012-08-03

Pricing models for disclosure of public sector information

The following post is an excerpt from my thesis entitled Linked open data for public sector information.
The disclosure of information might be a subject to charge. However, conditioning access to public sector information by prices may constitute a fundamental barrier.
The models for pricing public sector information may be divided into three groups. The first model sees public bodies act as private companies and tries to recover their costs incurred from information production. If public bodies charge only to recover the cost of information provision, they use the marginal cost model. To adopt the third model is to cease charging altogether and not require users of information to pay any price.

Cost recovery model

Public sector institutions are usually free to recoup some costs by charging users that access their information [1, p. 11]. When they employ the cost recovery pricing, they essentially behave the same way as for-profit companies.
Aside from the benefit of public bodies being able to sustain themselves, this model introduces a number of challenges. First, it is discriminative for those that cannot afford to pay for the access to information of their interest. For example, full cost recovery may have an adverse effect on small and medium-sized enterprises that do not have the necessary resources to obtain the information they need in order to pursue their business plan. Second, a large part of consumers of public sector infomation is constituted by other public sector bodies. If full cost recovery is demanded from public bodies, it reduces public sector information to an instrument of reallocation of the public funding.

Marginal cost model

Marginal cost pricing recoups only the costs of information provision. It is derived from the marginal cost of distribution, that reflects the cost of provision of one further unit. This pricing model is recommended by the EU directive on the re-use of public sector information [2]. If public bodies adopt the marginal cost pricing model and start charging less for their information, they might see a surge of interest for the information that might lead to a greater total income than in the cases when the bodies employ full cost recovery model. The use of the Web brings this pricing model close to the model that applies no prices, because on the Web the marginal cost of distribution covering the reproduction of information is essentially zero.

Open access model

In the open access model public body does not require a payment for provisioning of information to the public. This approach entails a significant reduction of friction and administrative overhead associated with each individual transaction of public sector information. It is a non-discriminative model, since it makes access to information to be independent on user’s budget.
A common argument for no pricing is that public sector information had been already paid for from the tax revenue and thus there should not be any additional charges [3, p. 55]. Pricing for information is seen as inconsistent with the established way of funding of public sector bodies. Public sector should not run business, and some contend that civil service is too inflexible to do so [4].
Several alternative models to recover partial costs were proposed to substitute for the direct cost recovery. For example, one model suggested imposing a levy on requests for updates of public data, such as in business registers [1, p. 27].

References

  1. VICKERY, Graham. Review of the recent developments on PSI re-use and related market developments [online]. Final version. Paris, 2011 [cit. 2012-04-19]. Available from WWW: http://ec.europa.eu/information_society/policy/psi/docs/pdfs/report/psi_final_version_formatted.docx
  2. EU. Directive 2003/98/EC of the European Parliament and of the Council of 17 November 2003 on the re-use of public sector information. Official Journal of the European Union. 2003, vol. 46, L 345, p. 90 — 96. Also available from WWW: http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2003:345:0090:0096:EN:PDF. ISSN 1725-2555.
  3. Beyond access: open government data & the right to (re)use public information [online]. Access Info Europe, Open Knowledge Foundation, January 7th, 2011 [cit. 2012-04-15]. Available from WWW: http://www.access-info.org/documents/Access_Docs/Advancing/Beyond_Access_7_January_2011_web.pdf
  4. ARTHUR, Charles; CROSS, Michael. Give us back our crown jewels. Guardian [online]. March 9th, 2006 [cit. 2012-03-09]. Available from WWW: "http://www.guardian.co.uk/technology/2006/mar/09/education.epublic

2012-08-02

Disclosure of public sector information

The following post is an excerpt from my thesis entitled Linked open data for public sector information.
The regulations require public bodies to take on an obligation of providing access to information they possess. The EU directive on the re-use of public sector information holds the disclosure of public sector information to be a “fundamental instrument for extending the right to knowledge, which is a basic principle of democracy” [1, p. 92]. In the light of this assertion, public bodies should ensure wide dissemination and long-term preservation of the information they produce.

Scope of disclosure

Public sector information is an umbrella term for all content produced by public bodies [2, p. 5]. Nonetheless, there are several exceptions to this rule, when defining the information that should be disclosed.
Public sector information covers any non-personal data held, collected or produced by a public body as a part of the public task, with the exception of the information relating to national security [3, p. 6]. Therefore, disclosure of public sector information should not apply to information that would abrogate individual privacy rights or endanger national security [4]. However, when left unquestioned, the goal of national security may lead public sector bodies to be overprotective of some data. For example, for some time in the US data about dams were not available due to the fear of misuse for terrorist attacks [5, p. 330].
In the EU, several types of public sector information are exempted from the requirement of disclosure. Public sector information held by cultural heritage institutions, such libraries, museums, and archives, currently falls under a different regime. It often has different qualities than the information from other parts of the public sector. This type of information is mostly static, held as a record, and not directly associated with the pursue of public tasks [6, p. 7]. Similarly, the public broadcasting and research information generated by education institutions is usually exempt from the scope of the definition of public sector information. However, besides the exceptions listed individually, all public sector information is a subject to the requirement of disclosure.

Types of disclosure

The approaches to disclosure of public sector information are usually categorized either according to the extent of disclosed information or by the activity of the public body.
The information that gets released might be limited a summary of the full information the public body possesses. Summary disclosure is used for informing about the decisions made by public bodies. On the other hand, full disclosure is used for informing the decisions of the public. For example, in the case of elections, decisions of the members of public are based on information from public bodies. Based on the distinction of the source of initiative that drives the disclosure, there are two models of information provision in the public sector: reactive and proactive [7, p. 155].

Reactive disclosure

Reactive disclosure is an on-demand, passive dissemination of public sector information that “implies an (enforceable) right for a subject to access to information on request” [8]. It institutes a permission culture of freedom of information requests. Joshua Tauberer criticizes reactive disclosure, because it provides only “a very narrow view of the public sector that is based on the requested snap-shots of data” [9]. This model is characterized by a strong information control and a lack of high-level political and bureaucratic support for open government and as such, reactive disclosure is unsuitable for the realization of this vision.

Proactive disclosure

Proactive disclosure is an active dissemination of public sector information that “means that the information is publicly available on the basis of a direct initiative of the public body” [8]. This type of disclosure may also be referred to as “suo motu” disclosure, that comes from the Latin “upon its own initiative” [10, p. 69]. Proactive disclosure thus requires a switch from “presumption of non-disclosure to presumption of openness” [Ibid., p. 66]. With such presumption, public sector information is thought of as public resource, as something to be shared. This way of disclosure is “suited for mediators” [9], that can transform the information and add value to it. An example of a model for proactive disclosure is open data, which will be discussed further in a greated detail.

References

  1. EU. Directive 2003/98/EC of the European Parliament and of the Council of 17 November 2003 on the re-use of public sector information. Official Journal of the European Union. 2003, vol. 46, L 345, p. 90 — 96. Also available from WWW: http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2003:345:0090:0096:EN:PDF. ISSN 1725-2555.
  2. SCHELLONG, Alexander; STEPANETS, Ekaterina. Unchartered waters: the state of open data in Europe [online]. CSC, 2011 [cit. 2012-04-12]. Public sector study series, 01/2011. Available from WWW: http://assets1.csc.com/de/downloads/CSC_policy_paper_series_01_2011_unchartered_waters_state_of_open_data_europe_English_2.pdf
  3. YIU, Chris. A right to data: fulfilling the promise of open public data in the UK [online]. Research note. March 6th, 2012 [cit. 2012-03-06]. Available from WWW: http://www.policyexchange.org.uk/publications/category/item/a-right-to-data-fulfilling-the-promise-of-open-public-data-in-the-uk
  4. GIGLER, Bjorn-Soren; CUSTER, Samantha; RAHEMTULLA, Hanif. Realizing the vision of open government data: opportunities, challenges and pitfalls [online]. World Bank, 2011 [cit. 2012-04-11]. Available from WWW: http://www.scribd.com/WorldBankPublications/d/75642397-Realizing-the-Vision-of-Open-Government-Data-Long-Version-Opportunities-Challenges-and-Pitfalls
  5. LATHROP, Daniel; RUMA, Laurel (eds.). Open government: collaboration, transparency, and participation in practice. Sebastopol: O’Reilly, 2010. ISBN 978-0-596-80435-0.
  6. VICKERY, Graham. Review of the recent developments on PSI re-use and related market developments [online]. Final version. Paris, 2011 [cit. 2012-04-19]. Available from WWW: http://ec.europa.eu/information_society/policy/psi/docs/pdfs/report/psi_final_version_formatted.docx
  7. FRANCOLI, Mary. What makes governments ‘open’?: sketching out models of open government. eJournal of eDemocracy and Open Government [online]. 2011 [cit. 2012-03-15], vol. 3, no. 2, p. 152 — 165. ISSN 2075-9517. Available from WWW: http://www.jedem.org/issue/view/5
  8. SOLDA-KUTZMANN, Donatella. Public sector information: a market without failure? In Share-PSI Workshop: Re
    moving the Roadblocks to a Pan-European Market for Public Sector Information Re-use
    [online]. 2011 [cit. 2012-03-09]. Available from WWW: http://share-psi.eu/submitted-papers/
  9. TAUBERER, Joshua. Open government data: principles for a transparent government and an engaged public [online]. 2012 [cit. 2012-03-09]. Available from WWW: http://opengovdata.io/
  10. Beyond access: open government data & the right to (re)use public information [online]. Access Info Europe, Open Knowledge Foundation, January 7th, 2011 [cit. 2012-04-15]. Available from WWW: http://www.access-info.org/documents/Access_Docs/Advancing/Beyond_Access_7_January_2011_web.pdf

2012-08-01

Legal aspects of public sector information

The following post is an excerpt from my thesis entitled Linked open data for public sector information.
Public sector information is a subject to jurisprudence based on different sources of law and regulations endowed with legal power. The law relevant for the disclosure of public sector information comes from multiple regulators and as such is a combination of both international law, including conventions or EU directives, and national law [1]. As a result of this state of affairs, the conditions governing public sector information may be composed of rules coming from multiple layers. In effect, the legislation related to public sector information may pose equivocal requirements and ordinances that are difficult to adhere to.
The right to access to public sector information stems from a basic human freedom to seek and impart information. Right to information is enshrined in at least 50 national constitutions [2, p. 62]. Dedicated acts formalizing the right to access to public sector information are established in a large part of countries that acknowledge the freedom of information.
First legal act on the access to public sector information entitled “Freedom of the Press Act” was passed in 1766 in Sweden [2, p. 57]. The right to know what proceedings of the public sector are was recognized as early as 1969 by the Japanese Supreme Court [3]. Other countries followed the suit by establishing the right to know and access to information as a part of the citizen rights. During the following decades the adoption was rather slow and in the middle of 1980s only 11 countries had freedom of information law [4, p. 264]. However, this area experienced a sudden growth of interest paired with an increasing number of countries recognizing the importance of access to information. By 2004 the number of countries that enacted a freedom of information law increased to 59 [Ibid., p. 264].
The prevailing presumption in favour of secrecy has shifted to presumption favouring maximum disclosure and public sector information that is open by default [5, p. 23]. In many countries, the default settings for access to public sector information have changed. Accessing public sector information is no longer perceived as a privilege, it is a right [6, p. 8].
My thesis focuses on the legal situation for public sector information in the European Union and its member countries. The EU legislation is most relevant for the European context, in which the thesis is situated, and which can prove to be a valid model for an official public policy that establishes rules for the domain of public sector information. Thus, the EU legislation would be treated in more detail.

Legislation in the European Union

In the EU, public sector information legislation consists of the directives of the European Commission and their local transpositions that weave the directives’ regulations into state law of the member countries. A key directive covering public sector information is the EU directive on the re-use of public sector information [7]. The directive “establishes a minimum set of rules governing the re-use and the practical means of facilitating re-use of existing documents held by public sector bodies” [Ibid., p. 93]. It prescribes public bodies to provide a mechanism for members of the public to request access to information produced by the bodies. The overarching tenet of the directive is non-discrimination, which manifests itself in stipulations including the prohibition of exclusive arrangements that grant exclusive rights to access to a particular entity, or the recommendation for marginal cost charging.
The planned novelization of the directive [8] extends the scope of public sector information to include the information from the cultural heritage sector, such as libraries, archives, and museums. Furthermore, it strives to conflate the right to access with the right to reuse. It brings about a change in the charging models that declares the marginal cost of reproduction as a new default, while requiring public bodies that continue charge extra price to provide a solid explanation for their behaviour. The novelization also deals with the enforcement of the directive and proposes to set up an independent authority to oversee the compliance with the principles of disclosure.

References

  1. KOUMENIDES, Christos L.; SALVADORES, Manuel; ALANI, Harith; SHADBOLT, Nigel R. Global integration of public sector information. In Proceedings of the WebSci10: Extending the Frontiers of Society On-line, April 26 — 27th, 2010, Raleigh (NC), US. Raleigh, 2010.
  2. Beyond access: open government data & the right to (re)use public information [online]. Access Info Europe, Open Knowledge Foundation, January 7th, 2011 [cit. 2012-04-15]. Available from WWW: http://www.access-info.org/documents/Access_Docs/Advancing/Beyond_Access_7_January_2011_web.pdf
  3. MENDEL, Toby. Freedom of information: an internationally protected human right. Comparative Media Law Journal. 2003, no. 1. Also available from WWW: http://www.juridicas.unam.mx/publica/rev/comlawj/cont/1/cts/cts3.htm
  4. BERTOT, John C.; JAEGER, Paul T.; GRIMES, Justin M. Using ICTs to create a culture of transparency: e-government and social media as openness and anti-corruption tools for societies. Government Information Quarterly. July 2010, vol. 27, iss. 3, p. 264 — 271. DOI 10.1016/j.giq.2010.03.001.
  5. LATHROP, Daniel; RUMA, Laurel (eds.). Open government: collaboration, transparency, and participation in practice. Sebastopol: O’Reilly, 2010. ISBN 978-0-596-80435-0.
  6. KUNDRA, Vivek. Digital fuel of the 21st century: innovation through open data and the network effect [online]. President and Fellows of Harvard College, 2012 [cit. 2012-03-15]. Discussion Paper Series, no. D-70. Available from WWW: http://shorensteincenter.org/wp-content/uploads/2012/03/d70_kundra.pdf
  7. EU. Directive 2003/98/EC of the European Parliament and of the Council of 17 November 2003 on the re-use of public sector information. Official Journal of the European Union. 2003, vol. 46, L 345, p. 90 — 96. Also available from WWW: http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2003:345:0090:0096:EN:PDF. ISSN 1725-2555.
  8. EU. Proposal for a Directive of the European Parliament and of the Council amending Directive 2003/98/EC on re-use of public sector information [online]. Brussels, December 12th, 2011 [cit. 2012-04-30]. COM (2011) 877. 2011/0430/COD. Available from WWW: http://ec.europa.eu/information_society/policy/psi/docs/pdfs/directive_proposal/2012/proposal_directive.pdf

2012-07-31

What is public sector information?

The following post is an excerpt from my thesis entitled Linked open data for public sector information.
Access to proceedings of the public sector is a fundamental underpinning of democracy. “Quality of public discussion would be significantly impoverished without the nourishment of information from public authorities” [1]. Moreover, economic and research activities in the private sector would be vastly impoverished if public sector information was kept concealed within the public sector. Reuse of public sector information in the private sector is a pivotal goal of its disclosure.
The disclosure of public sector information constitutes the subject matter of my thesis. In this blog post I try to delineate the scope of the domain described in the thesis by providing its basic conceptualization, along with lexical and extensional definitions of the concepts involved. To cater for this goal, this introductory post is concerned with definitions, describing what the concept of “public sector information” covers.
First, how can the borders of the public sector be circumscribed? Boundaries of the public sector are demarcated by private ownership. The institutions the public sector consists of are not private property [2, p. 5]. Instead, the public sector is publicly owned.
Other definitions of the public sector employ the viewpoints of policy control or financial control. A common way of how to give a definition to the public sector in law is to use an extensional definition enumerating the public bodies that fall within its scope.
However, the boundary between public and private sector is getting blurry, since a lot of the functions traditionally performed by public bodies have been outsourced within public-private partnerships. The public sector may also start to take on some characteristics of the private sector, such as the models of finance management.
The public sector is constituted of public bodies. Public body is an institution with legal subjectivity that belongs to the public sector. It is set up under law by the state or other public sector body. Public bodies are established for a specific purpose of meeting the needs in the general interest. They do not have a commercial character and so the majority of their budgets is funded from tax revenue [3, p. 55]. Among the public bodies that are deemed to be most important from the perspective of the data they produce are offices of cadaster, mapping agencies, statistical offices, or company registrars [4, p. 10].
Public bodies produce public sector information, or public data, which is the subject matter of this chapter. UK Public data transparency principles offer a working definition of “public data”. Public data is thought of as “the objective, factual, non-personal data on which public services run and are assessed, and on which policy decisions are based, or which is collected or generated in the course of public service delivery”. It is usually a by-product of the delivery of functions of public sector bodies, which makes it serve as an official public record as well [5]. The term “public sector data” is in most contexts used in the same way as “government data”, and can be thus treated as synonymous.
Given the generic definition of public sector information, enumerating all of the types of public data would be unnecesary. Instead, a few prototypical examples will be mentioned. In 2010, a survey by Socrata identified several high-value categories of data. Among the top-ranked categories were data about public safety, revenues and expenditures, and education. The most commonly used data categories in publicdata.eu, a catalogue of Europe’s public data, are “Finance and budgeting”, “Social questions”, and “Education and communication”. Among the other frequently mentioned types of public data are statistical or geospatial data, the types that are particularly important from the perspective of their reuse by businesses. Paul Clarke sorted out public data into 4 categories:
  • Historical data, such as statistics
  • Planning data, including legal regulations in progress
  • Infractructural data, for example, reference concepts such as postcodes
  • Operational data, covering real-time streaming data, e.g., traffic situation
Governments collect data for a plethora of topics, some of which may look obscure, such as the statistics of people injured by vending machines in the US [6]. Nevertheless, collection of all of the datasets should be justified by their function for fulfiling the requirements of the public task and by their contribution as a source of improvements, such as for increasing the safety of vending machines in the aforementioned example. The scope of public sector information follows the function of the public sector.

References

  1. MENDEL, Toby. Freedom of information: an internationally protected human right. Comparative Media Law Journal. 2003, no. 1. Also available from WWW: http://www.juridicas.unam.mx/publica/rev/comlawj/cont/1/cts/cts3.htm
  2. LIENERT, Ian. Where does the public sector end and the private sector begin? [online]. June 1st, 2009 [cit. 2012-04-29]. IMF working paper, no. 09/122. Available from WWW: http://www.imf.org/external/pubs/ft/wp/2 009/wp09122.pdf
  3. The Council of the European Communities. Council Directive 93/37/EEC of 14 June 1993 concerning the coordination of procedures for the award of public works contracts. Official Journal of the European Communities. August 9th, 1993, vol. 36, L 199, p. 54 — 84. Also available from WWW: http://eur-lex.europa.eu/LexUriServ/LexUriServ.d o?uri=CELEX:31993L0037:EN:PDF. ISSN 0378-6978.
  4. VICKERY, Graham. Review of the recent developments on PSI re-use and related market developments [online]. Final version. Paris, 2011 [cit. 2012-04-19]. Available from WWW: http://ec.europa.eu/information_society/policy/psi/docs/pdfs/report/psi_final_version_formatted.docx
  5. American Library Association. Key principles of government information [online]. Chicago, 1997 — 2012 [cit. 2012-04-07]. Available from WWW: http://www.ala.org/advocacy/govinfo/keyprinciples
  6. LOVLEY, Erika. The government has a database for most everything. Politico [online]. June 24th, 2009 [cit. 2012-04-07]. Available from WWW: http://www.politico.com/news/stories/0609/24118.html

2012-07-30

Linked open data for public sector information: sharing my thesis

The public sector records data about what it does and about the environment in which it operates. Nowadays, improved and automated ways of data collection lead to a growth of the volume of data that is available in the public sector. Digitization allows to store the recorded data in a way that scales. Presently, researchers estimate that more than 5 exabytes is stored online every day [1]. Fortunately, there are scalable technologies for data storage and retrieval at our disposal.

The Web enables zero cost reproduction of digital information that makes it possible to share the information in a frictionless manner. Building on the premise that data deemed useful for the public sector is useful for the private sector as well, online exchange of public sector data allows to maximize its value by reaching members of the public that may recycle it and reuse it for their own purposes. In fact, the increased access and reuse of the disclosed public data is driven by technologies making it feasible [2].

Digital data may be represented in structured ways that make it machine-readable. Raw, machine-readable representations of data are amenable to automated processing and enable to retain the generative value of data, so that people and computers might use the data in a non-predefined way. Machine readability makes possible a wide array of interactions with data that go far beyond displaying it. In this way, disclosure of public sector data in a machine-readable format allows members of the public to find new uses for the data.

Adoption of the available technologies for data representation and storage may prove to have a disruptive effect on the public sector. Graham Vickery emphasizes two technological developments that, in his opinion, completely redefined the possibilities for public sector information [3, p. 6]. First, he points out to the technologies that enable digitization of public resources. Second, he highlights the role of broadband telecommunications that enable better access to public sector information.

The technologies for representing and exchanging data constitute the basic components for open disclosure of data. Open access to public sector data is considered as a key ingredient for a government that is open. Open government is “the notion that the people have the right to access the documents and proceedings of government” [4, p. xix], which is necessary for an open society that “reflects the universal values of intellectual autonomy, equality and trust” [5, p. 8]. Coupled with the demand for openness of the public sector, the technologies stimulated numerous initiatives promoting open data world-wide. Open data is a set of practices for data disclosure that strives to provide for an equal access and an equal use of the data.

The foundations of open data draw from related approaches. Driven by the recognition of freedom of information as a basic human right, open data transposes the principles of open access, close to those of open source, onto data. It complements the adoption of the approaches of e-government, which promotes use of information and communication technologies to improve government processes, and coincides with the call for government 2.0, which makes a better use of online collaborative technologies to create a more participatory government.

The application of open data, and more specifically linked open data, to the information held by public sector bodies constitutes the main theme of my diploma thesis titled Linked open data for public sector information, of which I am going to share excerpts here, in the form of blog posts. I have decided to publish it in this way because it allows me to share short and focused pieces on specific topics rather than just publishing the whole thesis. I think of it as of re-contextualization: the information flows differently on the Web than in academia.

In the thesis, public sector information represents the content, to which the principles of open data are applied using the technologies recommended by the linked data publication model. The goal of my thesis is twofold. The first part explores the competitive advantage of linked data for release of public sector information under the terms of open data principles. The second part extrapolates the impact and challenges associated with the adoption of linked open data for public sector information.

I hope you will find it useful.

You can find the original fulltext of the thesis here.

Table of contents

  1. What is public sector information?
  2. Legal aspects of public sector information
  3. Disclosure of public sector information
  4. Pricing models for disclosure of public sector information
  5. Concepts of open data
  6. Legal openness of data
  7. Licences for open data
  8. Principles of open data: accessibility
  9. Principles of open data: use
  10. Qualities of open data
  11. Open data policies
  12. Open data for public sector information
  13. Open data infrastructure of the public sector
  14. Open data as a platform
  15. What is linked data?
  16. Technologies of linked data: URIs
  17. Technologies of linked data: HTTP
  18. Technologies of linked data: RDF
  19. Linked data principles
  20. Linked data: discoverability
  21. Linked data: accessibility
  22. Linked data: permanence
  23. Linked data: use
  24. Linked data: quality
  25. Linked open data in the public sector
  26. Impact of open data
  27. Impacts of open data: transparency
  28. Impacts of open data: accountability
  29. Impacts of open data: efficiency
  30. Impacts of open data: disintermediation
  31. Impacts of open data: participation
  32. Impacts of open data: business
  33. Impacts of open data: journalism
  34. Challenges of open data
  35. Challenges of open data: implementation
  36. Challenges of open data: information overload
  37. Challenges of open data: usability
  38. Challenges of open data: data literacy
  39. Challenges of open data: misinterpretation
  40. Challenges of open data: privacy
  41. Challenges of open data: data quality
  42. Challenges of open data: trust
  43. Challenges of open data: procured data
  44. Challenges of open data: summary

References

  1. WRUUCK, Patricia. 2012: the year of big data. European Public Policy Blog [online]. Brussels, May 1st, 2012 [cit. 2012-05-01]. Available from WWW: http://googlepolicyeurope.blogspot.com/2012/05/2012-year-of-big-data.html
  2. BERNERS-LEE, Tim; SHADBOLT, Nigel. Our manifesto for government data. Guardian Datablog [online]. January 21st, 2010 [cit. 2012-04-07]. Available from WWW: http://www.guardian.co.uk/news/datablog/2010/jan/21/timbernerslee-government-data
  3. VICKERY, Graham. Review of the recent developments on PSI re-use and related market developments [online]. Final version. Paris, 2011 [cit. 2012-04-19]. Available from WWW: http://ec.europa.eu/information_society/policy/psi/docs/pdfs/report/psi_final_version_formatted.docx
  4. LATHROP, Daniel; RUMA, Laurel (eds.). Open government: collaboration, transparency, and participation in practice. Sebastopol: O'Reilly, 2010. ISBN 978-0-596-80435-0.
  5. HALONEN, Antti. Being open about data: analysis of the UK open data policies and applicability of open data [online]. Report. London: Finnish Institute, 2012 [cit. 2012-04-05]. Available from WWW: http://www.finnish-institute.org.uk/images/stories/pdf2012/being%20open%20about%20data.pdf

2012-03-04

Opening contracted data in the public sector

Public sector sucks at making applications. Look at what applications it creates and look at what applications are created in the private sector, such as e-banking. The difference is huge. A common argument in favour of open government data follows this line of reasoning. Public sector is not able to create useful applications in a cost-efficient way, therefore it should openly publish its data and the applications will flow, produced by the members of public, for free.

See, the problem is that the public sector also sucks at making some data. Some types of data, such as geographical data or extensive surveys are quite difficult to gather by the means available to the public sector. The solution is to sign a contract with company that produces the requested data. By outsourcing acquisition of some types of data the public sector gets what it needs for its functioning. No problem so far.

The problem starts to appear in cases when the companies (often unlike the public sector) see the possibilites for reuse of the data. The companies producing the data are well aware of the ways in which their product can be reused by businesses to generate revenue. It would be stupid of the company to provide the public sector with an exclusive rights for the contracted data when it can be re-selled to other companies. For example, a company producing geospatial data for the cadaster may sell the same data to businesses producing maps. Of course, the public sector might want to get a licence permitting to re-distribute the data, but a contract containing such requirement would get a much higher pricing from the supplier, due to the fact that the supplier would be deprived of the additional income from re-selling the data. Opening highly reusable data might be pricey.

It leaves me with a lot of questions, wondering what is the best answer for opening data acquired by the public sector from a commercial supplier that is conscious of the real value of data and reflects it in the price.

At the beginning of #opendata film Tom Steinberg from MySociety says:

Open Government Data is any information the Government collects, by and large for their own purposes, that it then makes available for other people to use for their purposes.

The definition of open government data concerns the data that are collected by the government. However, it is not clear if it only covers data produced by the government itself or, if it includes data provisioned to the government by a third-party as well. Does the definition of open government data apply even for data that are a result of a public contract? If this is a correct interpretation, is it nevertheless the responsibility of the public sector to contract data in a way that allows to release the data as open data, even though it might significantly raise the price of the contract? Spending government's finance on this would certainly lower the barrier for starting a business based on such data. And, given its financial constraints, can the public sector afford to contract data in this way?

Acknowledgements: Thanks to Jáchym ??epický for bringing this point to our seminar Open Data and Public Sector: applying Austrian experience in Czech Republic.