2012-09-07

Challenges of open data: misinterpretation

The following post is an excerpt from my thesis entitled Linked open data for public sector information.
Another argument pointing at the potential risks in disclosure of public data was presented by Lawrence Lessig in an article titled Against transparency [1], in which he draws attention to adverse effects of misinterpretation of public data. He highlights the issues that arise when monopoly on interpretation is removed and members of the public are provided with raw, uninterpreted data [2, p. 2]. Disintermediation causes decontextualization of public sector data that may lead to highly divergent interpretations of the same data [3]. Such change may be perceived as a loss of control the civil servants used to have. Instead of an “official” interpretation of open data this would potentially lead to a plurality “competing” and possibly conflicting interpretations, some of which may be driven by malicious interests.
Lessig claims, paying respect to the alleged shortening attention spans of members of the public, that it is easier to come up with an incorrect judgement based on public data than one that is based on solid understanding [1]. The ability to correctly interpret data is largely prevalent only among people with suffiecient expertise and data literacy skills. Moreover, Archon Fung and David Weil argue that the way open data is disclosed is conducive to pessimistic view of the public sector. They claim that “the systems of open government that we’re building - structures that facilitate citizens’ social and political judgments - are much more disposed to seeing the glass of government as half or even one-quarter empty, rather than mostly full” [4, p. 107]. Such conditions may also make users of data susceptible to apophenia, a phenomenon of seeing patterns that actually do not exist [5, p. 2]. In fact, Lessig writes, encountered with the vast amounts of available public data, ignorance is a rational investment of attention [1]. Without a significant time investment and data literacy skills people will usually come to shallow and premature conclusions based on their examination of public data. Unfounded conclusions may be quickly adopted and spread by the media, which may cause significant harm of reputation of public sector bodies, civil servants, or politicians, until these assertions are re-examined and proven to be false. For example, unverified oversimplifications may be yielded from public data to support political campaigns. Open data can be misused for skewed interpretations supporting political actions, casting suspicion on public image of politicians that are the target of discreditation campaigns.
Misinterpretations may increase distrust in the public sector. Thus, Lessig makes the case for disclosing a limited amounts of public data prone to misinterpretation [Ibid.]. Even though, he is not completely opposing the transparency initiatives, he warns that careful considerations should be given when releasing sensitive information that may be misused for defamation.
Unrestricted access to communication channels provided by new media gives strong voice to all competing interpretations, unhindered by the filtering mechanisms of traditional publishing. This state of affairs results in unfounded claims and rumours to amplify and spread with an impact that was previously impossible to achieve, causing harm to personal reputations and the public image of government. Fortunately, the self-repairing properties of communication networks eventually lead to the rebuttal of misinformation. The openness of public data thus brings not only a greater control of the public sector, but indirectly also a better control of unproven claims.

References

  1. LESSIG, Lawrence. Against transparency: the perils of openness in government. The New Republic [online]. October 9th, 2009 [cit. 2012-03-29]. Available from WWW: http://www.tnr.com/article/books-and-arts/against-transparency
  2. DAVIES, Tim. Open data, democracy and public sector reform: a look at open government data use from data.gov.uk [online]. Based on an MSc Dissertation submitted for examination in Social Science of the Internet, University of Oxford. August 2010 [cit. 2012-03-09]. Available from WWW: http://www.opendataimpacts.net/report/wp-content/uploads/2010/08/How-is-open-government-data-being-used-in-practice.pdf
  3. KAPLAN, Daniel. Open public data: then what? Part 1 [online]. January 28th, 2011 [cit. 2012-04-10]. Available from WWW: http://blog.okfn.org/2011/01/28/open-public-data-then-what-part-1/
  4. LATHROP, Daniel; RUMA, Laurel (eds.). Open government: collaboration, transparency, and participation in practice. Sebastopol: O'Reilly, 2010. ISBN 978-0-596-80435-0.
  5. BOYD, Danah; CRAWFORD, Kate. Six provocations for big data. In Proceedings of A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society, 21 — 24 September 2011, University of Oxford. Oxford (UK): Oxford University, 2011. Also available from WWW: http://ssrn.com/abstract=1926431

2012-09-06

Challenges of open data: data literacy

The following post is an excerpt from my thesis entitled Linked open data for public sector information.
Even though open data bridges the data divide between the public sector and members of the public, it might be introducing a new data divide that separates those with resources to make use of the data and those who do not. Despite the fact that open data virtually eliminates the cost of data acquisition, the cost of use remains “sufficiently high to compromise the political impact of open data” [1, p. 11].
An oft-cited quote attributed to Francis Bacon claims that “knowledge is power”. If data is a source of knowledge, then opening it up creates a shift in access to a source of power. However, equal access to data does not imply equal use, nor equal empowerment, as transforming data into power requires not only access. Letting aside the concerns of unequal access addressed by the agenda of the digital divide, while the principles of open data lead to the removal of barriers to access, they do not remove all barriers to use. In this respect, it is vitally important to distinguish between the “opportunity” and the actual “realization” of use of open data [2]. Even though everyone may have equal opportunities to access and use open data, only someone is able to achieve “effective use” [Ibid.]. In the light of this assertion, open data empowers only the already empowered; those that have access to technologies and computer skills that are necessary to make use of the data.
The belief in transformative potential of open data is based on optimistic assumptions about the citizens’ data literacy. The technocratic perspective with which open data principles are drafted takes high level of skills necessary for working with data for granted. Thus, the open data initiatives are in a way exclusive as they are limited mostly to technically inclined citizens [3, p. 268].
The minimalist role of the public sector, withdrawn into the background to serve as a platform, proceeds of the supposition that members of the society have all the necessary ingredients to make effective use of open government data, such as high level of information processing capabilities [4]. Even though ICT penetration and internet connectivity may be sufficient to access open data, it is not enough to make use of it. What is also needed are the abilities to process and interpret the data. However, open data released in a raw form may not be easily digestible without a substantial proficiency in data processing. Therefore, it should not be underestimated that users are required to possess technical expertise to process the data.
The bottom line is that access to data may in fact increase the asymmetry in society. If all interest groups have equal access to public sector information, then we can expect that the better organized and well-equipped groups to make better use of it [5]. The asymmetry may stem from the fact, that the interest groups that are able to take advantage of the newly released information will prosper at the expense of groups that cannot do that.
On the other hand, this type of unequality is in a sense natural. Such state of affairs should not be considered as a final one, but rather as a starting point. David Eaves compares the challenge of increasing data literacy to increasing literacy in libraries and reminds us that “we didn’t build libraries for an already literate citizenry. We built libraries to help citizens become literate” [6]. In the same way, we do not publish open data expecting everyone will be able to use it. The data are released since access is a necessary prerequisite for use. Direct access to data by the empowered, technically-skilled infomediaries may become a basis for an indirect access for many more [7]. Coming from this perspective, the most effective uses of open data can be thought of as those that let others make effective use of the data.

References

  1. MCCLEAN, Tom. Not with a bang but with a whimper: the politics of accountability and open data in the UK. In HAGOPIAN, Frances; HONIG, Bonnie (eds.). American Political Science Association Annual Meeting Papers, Seattle, Washington, 1 — 4 September 2011 [online]. Washington (DC): American Political Science Association, 2011 [cit. 2012-04-19]. Also available from WWW: http://ssrn.com/abstract=1899790
  2. GURSTEIN, Michael. Open data: empowering the empowered or effective data use for everyone? First Monday [online]. February 7th, 2011 [cit. 2012-04-01], vol. 16, no. 2. Available from WWW: http://firstmonday.org/htb in/cgiwrap/bin/ojs/index.php/fm/article/view/3316/2764
  3. BERTOT, John C.; JAEGER, Paul T.; GRIMES, Justin M. Using ICTs to create a culture of transparency: e-government and social media as openness and anti-corruption tools for societies. Government Information Quarterly. July 2010, vol. 27, iss. 3, p. 264 — 271. DOI 10.1016/j.giq.2010.03.001.
  4. GIGLER, Bjorn-Soren; CUSTER, Samantha; RAHEMTULLA, Hanif. Realizing the vision of open government data: opportunities, challenges and pitfalls [online]. World Bank, 2011 [cit. 2012-04-11]. Available from WWW: http://www.scribd.com/WorldBankPublications/d/75642397-Realizing-the-Vision-of-Open-Government-Data-Long-Version-Opportunities-Challenges-and-Pitfalls
  5. SHIRKY, Clay. Open House thoughts, Open Senate direction. In Open House Project [online]. November 23rd, 2008 [cit. 2012-04-19]. Available from WWW: http://groups.google.com/group/openhouseproject/msg/53867cab80ed4be9
  6. EAVES, David. Learning from libraries: the literacy challenge of open data [online]. June 10th, 2010 [cit. 2012-04-11]. Available from WWW: http://eaves.ca/2010/06/10/learning-from-libraries-the-literacy-challenge-of-open-data/
  7. TAUBERER, Joshua. Open government data: principles for a transparent government and an engaged public [online]. 2012 [cit. 2012-03-09]. Available from WWW: http://opengovdata.io/

2012-09-05

Challenges of open data: usability

The following post is an excerpt from my thesis entitled Linked open data for public sector information.
Considering usability as a property of interfaces, raw data provides a difficult one. Largely, data is too unwieldy to be used by most people. For example, 50 % of the respondents in the Socrata’s open data study said that the data was unusable [1]. Alternatively, poor usability may be correlated with the low level of use most open data sources receive.
The requirements on usability of open data reviewed in a previous blog post prove to be difficult to satisfy. The usability barrier may be especially high when dealing with linked open data as was reported in the previous post about usability of linked data. Yet it is important not to compromise the generative potential of open data to low usability of the underlying technologies.
The challenge of usability requires data producers to refocus on the view of user-centric perspective. The following blog posts highlight the increased need for data literacy, which is necessary for interacting with open data, and warn of the dangers of incorrect interpretations drawn from data.

References

  1. Socrata. 2010 open government data benchmark study [online]. Version 1.4. Last updated January 4th, 2011 [cit. 2012-04-07]. Available from WWW:

2012-09-04

Challenges of open data: information overload

The following post is an excerpt from my thesis entitled Linked open data for public sector information.
As more and more data is released in the open there is a growing danger that irrelevant data might flood the data that is important [1]. Only few of the available datasets contain “actionable” information and there is no effective filtering mechanism to track them down. With open data “we have so many facts at such ready disposal that they lose their ability to nail conclusions down, because there are always other facts supporting other interpretations” [2].
The sheer volume of the existing open data makes it difficult to comprehend. At such scale there is a need for tools that make the large amounts of data intelligible. Edd Dumbill writes that “big data may be big. But if it’s not fast, it’s unintelligible” [3].
While human processing does not scale, machine processing does. Thus, the challenge of information overload highlights the need for machine-readable data. Big, yet sufficiently structured data may be automatically pre-processed and filtered to “small data” that people can manage to work with. For example, linked data may be effectively filtered with precise SPARQL queries harnessing its rich structure.
Scaling the processing of large amounts of machine-readable data with well-defined structure may be considered solved. However, the current challenge is to deal with the heterogeneity of data from different sources.

Heterogeneity

Not only is there a perceived information overload, there is also an overload of different and incompatible ways of representing information. What we have built out of different data formats or modelling approaches seems to be the proverbial “Tower of Babel”. In this state of affairs, the data available on the Web constitutes a highly dimensional, heterogeneous data space.
Nonetheless, it is in managing heterogeneous data sources where linked data excels. Linking may be considered as a lightweight, pay-as-you-go approach to intergration of disparate datasets [4]. Semantic web technologies also address the intrinsic heterogeneity in data sources by providing means to model varying levels of formality, quality, and completeness [5, p. 851].

Comparability

A key quality of data that suffers from heterogeneity is comparability. According to the SDMX content-oriented guidelines comparability is defined as “the extent to which differences between statistics can be attributed to differences between the true values of the statistical characteristics” [6, p. 13]. It is a quality of data that represents the extent to which the differences in data can be attributed to differences in the measured phenomena.
Improving comparability of data hence means minimizing unwanted interferences that skew the data. Influences leading to distorsion of data may originate from differences in schemata, differing conceptualizations of domains described in the data, or incompatible data handling procedures. Elimination of such influences leads to maximization of evidence in data, which reflects more directly on the observed phenomena.
The importance of comparability surfaces especially in data analysis tasks. Insights yielded from analyses then feed into decision support and policy making. Comparability also supports transparency of public sector data because it clears the view of public administration. It supports easier audits of public sector bodies due to the possibility to abstract from the ways used to collect data. On the other hand, incomparable data corrupts monitoring of public sector bodies and imprecise monitoring thus leaves an ample space for systemic inefficiencies and potential corruption.
The publication model of linked data has in-built comparability features, which come from the requirement for using common, shared standards. RDF provides a commensurate structure through its data model that linked data is required to conform to. The emphasis on reuse of shared conceptualizations, such as RDF vocabularies, ontologies, and reference datasets, provides for comparable data content.
In the network of linked data the “bandwagon” effect increases the probability of adoption of a set of core reference datasets, which further reinforces the positive feedback loop. Core reference data may be used to link other datasets to enhance their value. Such datasets attract most in-bound links, which leads to emergence of “linking hubs”. In this case, these de facto reference datasets derive their status from their highly reusable content. An example of this type of datasets is DBpedia, which provides machine-readable data based on Wikipedia. Its prime condition may be illustrated by the Linked Open Data Cloud, in the center of which it is prominently positioned, indicating the high number of datasets linking to it.
In contrast to these datasets, traditional reference sources are established through the authority of their publishers, which is reflected in policies that prescribe to use such datasets. Datasets of this type include knowledge organization systems, such as classifications or code lists, that offer shared conceptualizations of particular domains. For instance, a prototypical example of an essential reference dataset is the International System of Units that is a source of shared units of measurement. In contrast with the linking hubs of linked data, traditional reference datasets are, for the most part, not available in RDF and therefore not linkable.
The effect of using both kinds of reference data is the same. The conceptualizations they construct offer reference concepts that make data referring to them comparable. A trivial example to illustrate this point may be the use of the same units of measurement, which enables to sort data in an expected order.
Data might need to be converted prior to comparison with other datasets. In this case, there is a need for comparability on the level of the data the incomparable datasets refer to. Linked data makes this possible through linking; the same technology it applies to data integration. With the techniques, such as ontology alignment, mappings between reference datasets may be established to serve as proxies for the purpose of data comparison. Ultimately, machine-readable relationships in linked data make it outperform other ways of representing data when it comes to the ability to draw comparisons.

References

  1. FIORETTI, Marco. Open data, open society: a research project about openness of public data in EU local administration [online]. Pisa, 2010 [cit. 2012-03-10]. Available from WWW: http://stop.zona-m.net/2011/01/the-open-data-open-society-report-2/
  2. WEINBERGER, David. Too big to know. New York (NY): Basic Books, 2012. ISBN 978-0-465-02142-0.
  3. DUMBILL, Edd (ed.). Planning for big data: a CIO’s handbook to the changing data landscape [ebook]. Sebastopol: O’Reilly, 2012, 83 p. ISBN 978-1-4493-2963-1.
  4. HEATH, Tom; BIZER, Chris. Linked data: evolving the Web into a global data space. 1st ed. Morgan & Claypool, 2011. Also available from WWW: http://linkeddatabook.com/book. ISBN 978-1-60845-430-3. DOI 10.2200/S00334ED1V01Y201102WBE001.
  5. SHADBOLT, Nigel; O’HARA, Kieron; SALVADORES, Manuel; ALANI, Harith. eGovernment. In DOMINGUE, John; FENSEL, Dieter; HENDLER, James A. (eds.). Handbook of semantic web technologies. Berlin: Springer, 2011,
    p. 849 — 910. DOI 10.1007/978-3-540-92913-0_20.
  6. SDMX. SDMX content-oriented guidelines. Annex 1: cross-domain concepts. 2009. Also available from WWW: http://sdmx.org/wp-content/uploads/2009/01/01_sdmx_cog_annex_1_cdc_2009.pdf

2012-09-03

Challenges of open data: implementation

The following post is an excerpt from my thesis entitled Linked open data for public sector information.
Data publishers may perceive adoption of linked open data to have daunting entry barriers. In particular, they are aware of the high demands on expertise for publishing linked data, which is esteemed to have a steep learning curve. Linked data publishing model poses requirements that may seem to be difficult to meet. The Frequently Observed Problems on the Web of Data [1] testify to that.
Therefore, “it is vital to follow a realistic, practical and inexpensive approach” [2]. Fortunately, linked data facilitates an incremental, evolutionary information management. Its deployment may follow a step by step approach, adopting iterative development for continuous improvement. For example, before a switch of the database technology linked data publishers could start by caching given legacy databases into triple stores. Another way how to cushion the demands of linked data adoption is to minimise their ontological commitment by creating small ontologies that may be gradually linked together.
Two implementation challenges collocated with the adoption of linked open data in the public sector will be dealt with in detail; resistance to change in the public sector and maturity of the linked data technology stack.

Resistance to change

Rhetoric of open data supporters puts an emphasis on bureaucracy as a major barrier to opening data in the public sector. There is a tendency to frame the politics of access to data as a struggle between the public sector, that has an inbreed attachment to secrecy, and members of the public, which are depicted rather as individuals than groups [3, p. 7].
While this view seems to be biased, the institutional inertia may pose a challenge to adoption of open data, which may require a “cultural change in the public sector” [4]. The transition from the status quo may be significantly hindered by the established culture in the public administration. “A major impediment is an entrenched closed culture in many government organisations as a result of the fear of disclosing government failures and provoking political escalation and public outcry” [5]. The intangible problem of the closed mindset prevailing in the public sector proves to be difficult to resolve. And so, in many ways, the adoption of open data “isn’t a hardware retirement issue, it’s an employee retirement one” [6].
Resistance to change is not the only barrier hindering in the adoption of open data. A hurdle that is commonly encountered by open data advocates is that civil servants perceive open data as an additional workload that lacks clear justification [7, p. 70]. Unlike citizens that are allowed to do everything that is not prohibited, public servants are allowed to do only what law and policies order them to do. Voluntary adoption of open data at the lower levels of public administration is thus highly unlikely. It requires a policy to push open data through.
However, it might be for the existing policies that the change is made difficult. In general, the public sector is a subject to special obstacles that impede adoption of new technologies. For example, the combination of strict data handling procedures and constricted possibilities due to limited budget resources may effectively stop any technological change [7]. That is why there must by a strong commitment to open data on the upper levels of the public sector in order to put through the necessary amendments to existing data handling policies.

Technology maturity

Semantic web technologies underlying linked data were for a long time thought of as not being ready for adoption in the enterprise settings and in the public sector. In 2010, linked data technology stack was not perceived to be ready for large-scale adoption in the public sector. John Sheridan reports three key things missing [8]:
  • Repeatable design patterns
  • Supportive tools
  • Commoditization of linked data APIs
At that time, standards were mature enough, but their translation to repeatable design patterns applicable in practice was lacking. This has changed since. Several sources recommend established design patterns (e.g., [9], [10], [11]), supportive tools were developed and packaged (e.g., LOD2 Stack), and frameworks for developing custom APIs based on linked data were created (e.g., Linked Data API mentioned in a previous blog post). Linked data has matured progressively in the recent years and so it may be argued that it is ready to be implemented at the level of the public sector.

References

  1. HOGAN, Aidan; CYGANIAK, Richard. Frequently observed problems on the web of data [online]. Version 0.3. November 13th, 2009 [cit. 2012-04-23]. Available from WWW: http://pedantic-web.org/fops.html
  2. ALANI, Harith; CHANDLER, Peter; HALL, Wendy; O’HARA, Kieron; SHADBOLT, Nigel; SZOMSZOR, Martin. Building a pragmatic semantic web. IEEE Intelligent Systems. May—June 2008, vol. 23, iss. 3, p. 61 — 68. Also available from WWW: http://eprints.soton.ac.uk/265787/1/alani-IEEEIS08.pdf. ISSN 1541-1672. DOI 10.1109/MIS.2008.42.
  3. MCCLEAN, Tom. Not with a bang but with a whimper: the politics of accountability and open data in the UK. In HAGOPIAN, Frances; HONIG, Bonnie (eds.). American Political Science Association Annual Meeting Papers, Seattle, Washington, 1 — 4 September 2011 [online]. Washington (DC): American Political Science Association, 2011 [cit. 2012-04-19]. Also available from WWW: http://ssrn.com/abstract=1899790
  4. GRAY, Jonathan. The best way to get value from data is to give it away. Guardian Datablog [online]. December 13th, 2011 [cit. 2011-12-14]. Available from WWW: http://www.guardian.co.uk/world/datablog/2011/dec/13/eu-open-government-data
  5. VAN DEN BROEK, Tijs; KOTTERINK, Bas; HUIJBOOM, Noor; HOFMAN, Wout; VAN GRIEKEN, Stefan. Open data need a vision of smart government. In Share-PSI Workshop: Removing the Roadblocks to a Pan-European Market for Public Sector Information Re-use [online]. 2011 [cit. 2012-03-09]. Available from WWW: http://share-psi.eu/submitted-papers/
  6. DUMBILL, Edd (ed.). Planning for big data: a CIO’s handbook to the changing data landscape [ebook]. Sebastopol: O’Reilly, 2012, 83 p. ISBN 978-1-4493-2963-1.
  7. HALONEN, Antti. Being open about data: analysis of the UK open data policies and applicability of open data [online]. Report. London: Finnish Institute, 2012 [cit. 2012-04-05]. Available from WWW: http://www.finnish-institute.org.uk/images/stories/pdf2012/being%20open%20about%20data.pdf
  8. ACAR, Suzanne; ALONSO, José M.; NOVAK, Kevin (eds.). Improving access to government through better use of the Web [online]. W3C Interest Group Note. May 12th, 2009 [cit. 2012-04-06]. Available from WWW: http://www.w3.org/TR/egov-improving/
  9. SHERIDAN, John; TENNISON, Jeni. Linking UK government data. In BIZER, Christian; HEATH, Tom; BERNERS-LEE, Tim; HAUSENBLAS, Michael (eds.). Li
    nked Data on the Web: proceedings of the WWW 2010 Workshop on Linked Data on the Web, April 27th, 2010, Raleigh, USA
    . Aachen: RWTH Aachen University, 2010. CEUR workshop proceedings, vol. 628. ISSN 1613-0073.
  10. DODDS, Leigh; DAVIS, Ian. Linked data patterns [online]. Last changed 2011-08-19 [cit. 2011-11-05]. Available from WWW: http://patterns.dataincubator.org
  11. HEATH, Tom; BIZER, Chris. Linked data: evolving the Web into a global data space. 1st ed. Morgan & Claypool, 2011. Also available from WWW: http://linkeddatabook.com/book. ISBN 978-1-60845-430-3. DOI 10.2200/S00334ED1V01Y201102WBE001.
  12. HYLAND, Bernardette; TERRAZAS, Boris Villazón; CAPADISLI, Sarven. Cookbook for open government linked data [online]. Last modified on February 20th, 2012 [cit. 2012-04-11]. Available from WWW: http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook

2012-09-02

Challenges of open data

The following post is an excerpt from my thesis entitled Linked open data for public sector information.
Open data not only opens new opportunities, it also opens new challenges. These challenges point to the limits of openness and to shortcomings of the approaches used to put linked open data in practice in the public sector.
The top 10 barriers and potential risks for adoption of open data in the public sector, which were compiled by Noor Huijboom and Tijs van den Broek [1, p. 7], comprise of the following.
  • closed government culture
  • privacy legislation
  • limited quality of data
  • limited user-friendliness/information overload
  • lack of standardisation of open data policy
  • security threats
  • existing charging models
  • uncertain economic impact
  • digital divide
  • network overload
Some of these challenges will be discussed in detail in the following blog posts. In particular, this section will cover the difficulties that may be encountered during implementation of linked open data, information overload and the problems of scalable processing of large, heterogeneous datasets, usability of raw data, issues for protection of personal data, deficiencies in data quality, adverse effects of open data on trust in the public sector, and finally the unresolved question of opening data obtained via public procurement.

References

  1. HUIJBOOM, Noor; VAN DEN BROEK, Tijs. Open data: an international comparison of strategies. European Journal of ePractice [online]. March/April 2011 [cit. 2012-04-30], no. 12. Available from WWW: http://www.epractice.eu/files/European%20Journal%20epractice%20Volume%2012_1.pdf. ISSN 1988-625X.

2012-09-01

Impacts of open data: journalism

The following post is an excerpt from my thesis entitled Linked open data for public sector information.
The availability of data and data processing tools gives birth to a new paradigm in journalism that is commonly referred to as data-driven journalism. It refers to the practice of basing journalistic articles on hard data, which allows to back up claims with well-founded evidence.
Unlike in journalism that is driven by data, unverified claims abound in traditional journalistic practice. To address this deficiency, data-driven journalism may employ open data sources to cross-verify the claims. Data triangulation combining disparate sources may establish validity of the verified claims.
If data-driven journalists strive to draw closer to objectivity, they need to share their sources to achieve transparency. Sharing the underlying data is an imperative of data-driven journalism, so that others can see what lead to insights conveyd in articles. In the light of such transparency, claims made by journalists may be verified by third parties and trust may be established.
The best known examples of data-driven journalism include the Guardian’s Datablog or Pro Publica.