News archive‎ > ‎

Top 3 reasons to use Linked Open Data to create smart data

posted Mar 4, 2016, 10:09 AM by Unknown user   [ updated Jun 9, 2018, 9:56 AM ]

Big data has been a huge hype in recent years, to the extent that it has become a buzzword.

Despite that, the question always remains the same: what can I do with data?

It’s relatively easy to generate an abundance of data. Data sets, open or private, are literally exploding. But data alone are useless. It is how you combine them that makes the difference. You need to convert them into knowledge that drives business results. That’s smart data.

Here is where Linked Open Data (LOD) can help! Yes, we think Linked Open Data is one of the first, if not the first, concrete case of big data. Linked Open Data and the semantic web are not new concepts. But there are three key reasons why today they’re the best way to create smart data.

REASON #1 - Long tail is where the value is

Data sets on the web are literally exploding. Of course, they are still fragmented and of different quality but it is only a matter of time until they will be fixed. That is a great opportunity for companies. So, the key question is: which data set should I consider? It is simple. All of them! Yes, you have understood me correctly. With Linked Open Data it does not make sense to use only the biggest ones because the highest quality data are often in the smallest data sets. These are in the long tail of the whole available data sets.
Hence, you should use them or you will lose a lot of value.

That immediately poses two problems. The first is how to manage all these data sets. The second is how to manage inconsistencies. This last one is solved by LOD and we are going to explain this in REASON #2. First, you need to do that manually unless you use a service like LinkedData.Center. Through a dedicated ontology  KEES (Knowledge Exchange Engine Schema), LinkedData.Center has powerful “data ingestion capabilities” to easily configure how, when and which data sets have to be used.
Just do it once and update over time.

We call it knowledge base configuration file. Put this file in LinkedData.Center and you’ll get your knowledge base ready to use. The nice thing is that the created knowledge base can be easily shared or sold just by exchanging this little file. Imagine!

2.  REASON #2 - It’s all a matter of trust

Here we are looking at the problem of data inconsistencies which, of course, this will occur a lot if we use all the available data sets. With LOD you have to completely change your way of thinking about data. The aim is not to have 100% correct data but to have data you trust! In the end, this is the reality.
If a company buys a dataset for a lot of money from a data provider it is only because it trusts that provider. It is the same mechanism that makes us trust data in Wikipedia more than in an unknown website. In the end, this is just a matter of the trust that we attribute to sources. 

Now, the good thing of LOD is that any used dataset keeps its info of provenance. This means that you know the exact data source at any time. Here is the trick: assign a trust level to each source so that the knowledge base is able to decide, in case of inconsistency, which is the data you trust more. That is called "trust map". The incredible thing is that this trust map can be dynamically updated according to any kind of criteria. That is what LinkedData.Center does very well.
Being a fully W3C standard quadstore it keeps the provenance info and allows you to manage inconsistencies to get the most trusted data.

3.    REASON #3 - Think big, start small .... but…start now

When using LOD you don’t need to do everything from the beginning. Imagine a child. He / she will start by learning a few things and his / her knowledge will constantly grow. The more his / her knowledge grows, the more sophisticated it becomes and, also, the more it can accept complex reasoning and inferences.
This is the same with LOD. It starts with the datasets you have and it adds over time any other public, private, open or not open ones, that you are able to find anywhere. It will create a growing knowledge base that will continuously improve and increase in its value. Moreover, as LinkedData.Center is offered as a service, you can scale your knowledge base at any time in terms of both performances and storage with no limits and just paying for what you are actually using. 

This is not just a principle: it is a key value of Linked Open Data. The Resource Description Framework that is used to create LOD is tolerant to any dataset. It means that you do not need to structure the data model from the beginning: that is almost impossible when you do not know which datasets could be relevant, among the many available and over long periods of time. It is an incredible plus compared to other solutions, such as e.g. a typical relational model, for at least two reasons. First, it allows you to dramatically simplify any data integration and, also, knowledge creation project (one that can be managed with a step-by-step approach) knowing that whatever has been created will continue to be relevant and valid. Second, it uses ontologies which can evolve consistently with the knowledge base and they allow the companies to share or clarify concepts, definitions, and languages across different organizational structures.

Bigger players such as IBM (with Watson) or Google (with its knowledge graph) have just started doing this. They have understood that creating a knowledge base is essential and takes time. Rome wasn’t built in a day. So, it is up to you.
You can sit down, wait, and see while someone else steps up OR you can start, like a child, developing your knowledge and investing in it, to get smart data useful for your business.