In 2006 Berners-Lee wrote an influential note suggesting principles for the publication of data on the semantic web.
Since then the volume of data has grown from around 2 billion triples in 2007 to over 30 billion in 2011 (last time this number was computed), interconnected by over 500 million RDF links, the main purpose of which is to establish chains of URIs that refer to the same individuals. Through such links, published datasets are combined into a vast body of data known as a "cloud".
Figure 5 shows a diagram of the linked data cloud for 2007, in which nodes represent published datasets, and links represent sets of RDF triples through which the URIs in one dataset are paired with their counterparts in another dataset. Thus the link from DBpedia to MusicBrainz means that DBpedia includes not only RDF triples that give informaton about the world, but also triples that link some DBpedia names to their synonyms in MusicBrainz. We have seen examples of such statements in the last section, including the following triple which links the two names for the Beatles.
<http://musicbrainz.org/artist/b10bbbfc-cf9e-42e0-be17-e2c3e1d2600d> <http://www.w3.org/2002/07/owl#sameAs> <http://dbpedia.org/resource/The_Beatles>.
Note that since the "sameAs" relation is transitive and commutative, two statements of the form "X sameAs Y" and "Y sameAs Z" (or equivalently "Z sameAs Y") can be combined to infer "X sameAs Z"; in this way, lists of synonymous names can be derived from the cloud.
In his 2006 note, Berners-Lee set out four simple principles for publishing data on the web. These are best seen as rules of best practice rather than rules that must be obeyed: the idea is that the more people follow these principles, the more their data will be usable by others.
In brief, the principles are as follows:
The rationale for these principles is probably obvious. By using URIs to identify individuals, classes, and properties, we obtain names that perform a double duty: as well as referring to the relevant thing, they give us a location on the web where we may look for information about that thing. Other naming schemes accomplish only the first of these duties. However, to obtain benefit from a name that also serves as a web address, the URI should not be a broken link. It should point to relevant information, encoded in one of the expected formats. This benefit will be enhanced further if the information includes URIs that point to other locations on the web from which additional relevant information might be recovered.
In 2010 Berners-Lee extended the note referenced above to propose a system for rating datasets, based on the five-star rating system used for hotels. Closely related to the principles just listed, the system is as follows:
Note that every level here includes the previous levels: thus for instance three-star data must also be available on the web in machine-readable form.
We have shown above a diagram of the linked data cloud for 2007 (Figure 5). For comparison, Figure 6 shows the corresponding diagram for 2014 (the last year for which we have data), showing the expansion that has taken place during these years. The picture is just considering dataset reachable by lod-cloud.net crowler and it is REALLY partial.
The colours on this diagram provide a broad categorisation of the domains of the various datasets.
To explore the possibilities of linked data browsers and mashups (which combine data from many sources), look at these examples of working websites based on semantic web technology.
Suggested reading: Semantic Technologies and Linked Data Foundations
This article reuse some of the results of EUCLID project (EU FP7 - 296229). Except for third party materials and otherwise stated, the content of this site is made available under a Creative Commons Attribution 3.0 Unported License.