2 Graph DB‎ > ‎Training‎ > ‎

Introduction to Linked Data

In 2006 Berners-Lee wrote an influential note suggesting principles for the publication of data on the semantic web. 

Since then the volume of data has grown from around 2 billion triples in 2007 to over 30 billion in 2011 (last time this number was computed), interconnected by over 500 million RDF links, the main purpose of which is to establish chains of URIs that refer to the same individuals. Through such links, published datasets are combined into a vast body of data known as a "cloud".

RDF graph

Figure 5: Linked data cloud (2007)
Source: http://lod-cloud.net
Citation: Linking Open Data cloud diagram (2007), by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net
License: CC-BY-SA

Figure 5 shows a diagram of the linked data cloud for 2007, in which nodes represent published datasets, and links represent sets of RDF triples through which the URIs in one dataset are paired with their counterparts in another dataset. Thus the link from DBpedia to MusicBrainz means that DBpedia includes not only RDF triples that give informaton about the world, but also triples that link some DBpedia names to their synonyms in MusicBrainz. We have seen examples of such statements in the last section, including the following triple which links the two names for the Beatles.


Note that since the "sameAs" relation is transitive and commutative, two statements of the form "X sameAs Y" and "Y sameAs Z" (or equivalently "Z sameAs Y") can be combined to infer "X sameAs Z"; in this way, lists of synonymous names can be derived from the cloud.


In his 2006 note, Berners-Lee set out four simple principles for publishing data on the web. These are best seen as rules of best practice rather than rules that must be obeyed: the idea is that the more people follow these principles, the more their data will be usable by others.

In brief, the principles are as follows:

  1. Use URIs to identify things.
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information, using the standards (RDF, RDFS, SPARQL).
  4. Include links to other URIs, so that they can discover more things.

The rationale for these principles is probably obvious. By using URIs to identify individuals, classes, and properties, we obtain names that perform a double duty: as well as referring to the relevant thing, they give us a location on the web where we may look for information about that thing. Other naming schemes accomplish only the first of these duties. However, to obtain benefit from a name that also serves as a web address, the URI should not be a broken link. It should point to relevant information, encoded in one of the expected formats. This benefit will be enhanced further if the information includes URIs that point to other locations on the web from which additional relevant information might be recovered.

Rating published datasets

    In 2010 Berners-Lee extended the note referenced above to propose a system for rating datasets, based on the five-star rating system used for hotels. Closely related to the principles just listed, the system is as follows:

    • One-star (*): The data is available on the web with an open license.
    • Two-star (**): The data is structured and machine-readable.
    • Three-star (***): The data does not use a proprietary format.
    • Four-star (****): The data uses only open standards from W3C (RDF, SPARQL).
    • Five-star (*****): The data is linked to that of other data providers.

    Note that every level here includes the previous levels: thus for instance three-star data must also be available on the web in machine-readable form.

    Growth of linked data on the web

    We have shown above a diagram of the linked data cloud for 2007 (Figure 5). For comparison, Figure 6 shows the corresponding diagram for 2014 (the last year for which we have data), showing the expansion that has taken place during these years. The picture is just considering dataset reachable by lod-cloud.net crowler and it is REALLY partial.  

    Figure 6: Linked data cloud (2011)
    Source: http://lod-cloud.net
    Citation: Linking Open Data cloud diagram (2011), by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net
    License: CC-BY-SA

    The colours on this diagram provide a broad categorisation of the domains of the various datasets. 


    To explore the possibilities of linked data browsers and mashups (which combine data from many sources), look at these examples of working websites based on semantic web technology.

    BBC Music
    The BBC has launched a music portal based on Linked Data at http://www.bbc.co.uk/music.
    The University of Leipzig has a community project providing street map information based on Linked Data, at http://linkedgeodata.org/.
    US government data
    In 2009 the US and UK governments made commitments to open data. The US government data site is at http://www.data.gov/.
    UK government data
    Available at http://data.gov.uk/ with over 25000 datasets published at the time of writing.

    Suggested reading: Semantic Technologies and Linked Data Foundations

    Creative Commons License This article reuse some of the results of  EUCLID project (EU FP7 - 296229)Except for third party materials and otherwise stated, the content of this site is made available under a Creative Commons Attribution 3.0 Unported License.