News archive

Watch out for the robot's error

posted Jun 7, 2019, 7:31 AM by Enrico Fagnoni

The result obtained from an algorithm based on neural networks cannot be explained. Moreover, it always has a statistical error, which is often also quantifiable.

Lack of proof is the fundamental difference between neural networks and other A.I. tools like, for example, the inferential systems based on the open world assumption (i.e. rules systems that are tolerant of any lack of information). Such kind of A.I. systems, unlike neural networks, are always able to motivate their choices. The Semantic Web is the most known example.

The prevailing trend collapses the whole  A.I. ​​on machine learning only  (in particular on neural networks), but there are many ways of doing things. The technique of making a machine learn by example is undoubtedly the one that requires less cognitive effort on the part of human beings, and for that, perhaps it generates so many expectations.

In order for things to work, we always need a logical-deductive substrate, which perhaps, in a more or less distant future, could also be deduced from a machine but which, for now, MUST always be modeled "by hands" and it must always be an integral part of every automatic system that takes decisions.

In other words, you always need to insert the models generated through machine learning in a formal logical context, which evaluates rules defined by humans and based on socially shared conceptualizations.  To build this logical model,  you need to think a lot, discuss a lot and work hard to formalize it, maybe that's why we tend to pretend it's not needed.

In exchange for such significant work, you can always know what you're talking about, what you're doing and why you're doing it. 

Machine Learning: oh yeah?

posted May 17, 2019, 1:51 AM by Enrico Fagnoni   [ updated May 17, 2019, 1:54 AM ]

Recently, someone starts to speak about the training of AI models while maintaining learning data privacy (e.g. "A Demonstration of Sterling: A Privacy-Preserving Data Marketplace" Nick Hynes, David Dao, David Yan, Raymond Cheng, Dawn Song VLDB Demo 2018. )

In my opinion, the ML model creation makes sense only on public data sets or, in any case, on data verifiable by those who will use the model obtained from them. Otherwise, the results are indistinguishable from random values that should be revalidated experimentally on each update of the model.

Using a "black box" model is a total act of faith. We fall into the "reputational" scheme where the crowd decides what is right and what is wrong without having the elements to do so.  At the very least, those who produce an opaque model should be responsible for the errors produced by the model they created.

The risk of switching from fake news to fake data and/or fake models is very high. 

We need the "Oh yeah?" button

In this regard, I invite you to read this passage taken from an article in 1997 by Sir Tim Berner Lee that I copy here for brevity:

Deeds are ways we tell the computer, the system, other people, the Web, to trust something. How does the Web tell us?

It can happen in lots of ways but again it needs a clear user interface. It's no good for one's computer to be aware of the lack of security about a document if the user can ignore it. But then, most of the time as user I want to concentrate on the content not on the metadata: so I don't want the security to be too intrusive. The machine can check back the reasons why it might trust a document automatically or when asked. Here is just one way I could accept it.

At the toolbar (menu, whatever) associated with a document there is a button marked "Oh, yeah?". You press it when you lose that feeling of trust. It says to the Web, "so how do I know I can trust this information?". The software then goes directly or indirectly back to metainformation about the document, which suggests a number of reasons. These are like incomplete logical proofs. One might say,

"This offer for sale is signed with a key mentioned in a list of keys (linked) which asserts that tthe Internet Consumers Association endoses it as reputable for consumer trade in 1997 for transactions up to up to $5000. The list is signed with key (value) which you may trust as an authority for such statements."

Your computer fetches the list and verifies the signature because it has found in a personal statement that you trust the given key as being valid for such statements. That is, you have said, or whoever your trusted to set up your profile said,

"Key (value) is good for verification of any statement of the form `the Internet Consumers Association endorses page(p) as reputable for consumer trade in 1997 for transactions up to up to $5000. '"

 and you have also said that "I trust for purchases up to $3000 any page(p) for which `the Internet Consumers Association endorses page(p) as reputable for consumer trade in 1997 for transactions up to up to $5000."

The result of pressing on the "Oh, yeah?" button is either a list of assumptions on which the trust is based, or of course an error message indicating either that a signature has failed, or that the system couldn't find a path of trust from you to the page.

Notice that to do this, we do not need a system which can derive a proof or disproof of any arbitrary logical assertion. The client will be helped by the server, in that the server will have an incentive to send a suggested proof or set of possible proof paths. Therefore it won't be necessary for the client to search all over the web for the path.

The "Oh, yeah?" button is in fact the realively easy bit of human interface. Allowing the user to make statements above and understand them is much more difficult. About as difficult as programming a VCR clock: too difficult. So I imagine that the subset of the logic language which is offered to most users will be simple: certainly not Turing complete!

The hype of ML and of AI must not let us forget that some problems, and the solutions, are even older than the internet.

Re-thinking applications in the edge computing era

posted Mar 17, 2019, 4:51 AM by Enrico Fagnoni   [ updated Mar 18, 2019, 2:18 AM ]

The EU GDPR directive was a cornerstone in Information Society. More or less, it states that the ownership of data is an inalienable right of the data producer; before GDPR the data ownership was something marketable. Now, to use some else data, you need always get permissions that can be revoked anytime. Beside this, IoT requires more and more local data processing driving the edge computing paradigm.

Recent specifications like SOLID and IPFS promise radical but practical solutions to move toward a real data distribution paradigm, trying to restore the original objective of the web:  knowledge sharing. 

This view, where each person/machine has full control of his data, contrasts with the centralized application data architecture used by the majority of applications. 
Many signs tell us that this new vision is gaining consensus, both in the political and social world;  but today, even when applications claim to be distributed (e.g. Wikipedia), as a matter of fact, they still adopt a centralized data management architecture.

According to Sir Tim Berner Lee, "The future is still so much bigger than the past". To be ready, we need to rethink data architectures, allowing applications to use information produced and managed by someone, people or machines, out of our control.

The  Eric Brewer theorem (also known as CAP theorem), states that it is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees:
  • Consistency: Every read receives the most recent write or an error
  • Availability: Every request receives a (non-error) response – without the guarantee that it contains the most recent write
  • Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes
CAP is frequently misunderstood as if one has to choose to abandon one of the three guarantees at all times. In fact, the choice is really between consistency and availability only when a network partition or failure happens; at all other times, no trade-off has to be made. 

But in a really distributed data model, where datasets are not in your control, network failure is ALWAYS an option, so you have always to chose.

Dynamic caching is probably the only practical solution to face the dataset distribution problem, but as soon as you replicate data, a tradeoff between consistency and latency arises.

Daniel J. Abadi from Yale University in 2010 found that even (E) when the system is running normally in the absence network errors, one has to choose between latency (L) and consistency (C). This is known as the PACELC theorem.

What all this does it means? You must start rethinking applications forgetting the deterministic illusion that functions return the same outputs when you provide the same inputs.
In fact, the determinism on which much of today's information technology is based should be questioned. We have to start thinking about everything in terms of probability.

That's already happening with search engines (you do not get the same result for the same query), or with social networks (you can't see the same list of messages). It is not a feature, it's due to technical constraints but Facebook, Google, and many other companies cleverly turned this problem into an opportunity, prioritizing ads, for instance.

If the edge computing paradigm will get the momentum,  all applications, also the corporate ones, will have to address similar issues. For instance, the customer/supplier registry could (or should ) be distributed.

Technologies and solutions such as IPFS,  Linked Data, and RDF Graph Databases provide practical solutions to caching and querying distributed dataset, helping to solve inconsistencies and performance issues. But they can not be considered a drop-in replacement of older technology: they are tools to be used to design a new generation of applications that are able to survive to the distributed dataset network.

Introducing the Financial Report Vocabulary

posted Feb 19, 2019, 7:33 AM by Enrico Fagnoni   [ updated Feb 19, 2019, 7:33 AM ]

The Financial Report Vocabulary (FR) is an OWL vocabulary to describe a generic financial report.

The FR vocabulary can be used to capture different perspectives of report data like historical trends, cross-department, and component breakdown.

FR extends the W3C RDF Data Cube Vocabulary and it is inspired by the Financial Report Semantics and Dynamics Theory.

New KEES specifications

posted Feb 19, 2019, 7:19 AM by Enrico Fagnoni   [ updated Feb 19, 2019, 7:27 AM ]

In order to let computers to work for us, they must understand data: not just the grammar and the syntax, but the real meaning of things.

KEES (Knowledge Exchange Engine Service) proposes some specifications to describe a domain knowledge in order to make it tradeable and shareable.

KEES allows to formalize and license:

  • how to collect the right data,
  • how much you can trust in your data,
  • what new information you can deduct from the collected data,
  • how to answer specific questions using data

A.I. and humans can use these know hows to reuse and enrich existing knowledge. KEES is a Semantic Web Application.

KEES Overview

Released µSilex

posted Oct 1, 2018, 12:32 PM by Enrico Fagnoni   [ updated Feb 19, 2019, 7:12 AM ]

µSilex (aka micro Silex) is a micro framework inspired by Pimple and PSR standards. All with less than 100 lines of code!

µSilex is a try to build a standard middleware framework for developing micro-services and APIs endpoints that require maximum performances with a minimum of memory footprint.

Middleware is now a very popular topic in the developer community, The idea behind it is “wrapping” your application logic with additional request processing logic, and then chaining as much of those wrappers as you like. So when your server receives a request, it would be first processed by your middlewares, and then after you generate a response it will also be processed by the same set:
It may sound complicated, but in fact, it’s very simple if you look at some examples of what could be a middleware:

  • Firewall – check if requests are allowed from a particular IP
  • JSON Formatter – Parse JSON post data into parameters for your controller. Then turn your response into JSON before sending ti back
  • smart proxies - forward a request to other servers filtering and enriching the message payload.

SDaaS community edition released

posted Sep 18, 2018, 1:22 AM by Enrico Fagnoni

A simplified version of LinkedData.Center SDaaS™ platform was released with an open source model.

Metaphors, Models, and Theories

posted Aug 30, 2018, 9:52 PM by Enrico Fagnoni   [ updated Sep 1, 2018, 12:00 AM ]

Because most  software developers are not familiar with using “formal theories” it is worth explaining what a theory is. 

In his book, “Models. Behaving. Badly.”,  Emanual Derman explains the differences between metaphors, models, and theories.
  • A metaphor describes something less understandable by relating it to something more understandable.
  • A model is a specimen that exemplifies the ideal qualities of something. Models tend to simplify. There tend to always be gaps between models and reality. Models are analogies; they tend to describe one thing relative to something else. Models need a defense or an explanation.
  • A theory describes absolutes. Theories are the real thing. A theory describes the object of its focus. A theory does not simplify. Theories are irreducible, the foundation on which new metaphors can be built. A successful theory can become a fact. A theory describes the world and tries to describe the principles by which the world operates. A theory can be right or wrong, but it is characteristic by its intent: the discovery of essence.
Theories can be expressed logically, mathematically, symbolically, or in common language; but are generally expected to follow well understood principles of logic or rational thought.

Theory can be implemented within a robust model which is understandable by computer software.

Linked Data in Robotics and Industry 4.0

posted Mar 27, 2018, 10:10 AM by Enrico Fagnoni   [ updated Mar 27, 2018, 10:32 AM ]

Industry 4.0
is a collective term (created in Germany) for the technological concepts of cyber-physical systems, the Internet of Things and the Internet of Services, leading to the vision of the Smart Factory. Within a modular structured Smart Factory, cyber-physical systems monitor physical processes, and make decentralized decisions. Over the Internet of Things, cyber-physical systems communicate and cooperate with each other and humans in real time. In addition, one of the aims in robotics is to build smarter robots that can communicate, collaborate and operate more naturally and safely. Increasing a robot’s knowledge and intelligence is a vital for the successful implementation of Industry 4.0, since traditional approaches are not flexible enough to respond to the rapidly changing demands of new production processes and their growing complexity. 

As identified in both academia and industry, there are several design principles in Industry 4.0, which support companies in identifying and implementing Industry 4.0 scenarios:

  • Interoperability: the ability of cyber-physical systems (i.e. workpiece carriers or assembly stations) and humans to connect and communicate via the Internet of Things 
  • Virtualization: linking sensor data (from monitoring physical processes) with virtual plant models and simulation models 
  • Decentralization: the ability of cyber-physical systems within Smart Factories to make decisions on their own
  • Real-Time Capability: the capability to collect and analyze data and provide the derived insights immediately
  • Service Orientation: offering of services (cyber-physical systems, humans or Smart Factories)
  • Modularity: flexible adaptation of Smart Factories to changing requirements by replacing or expanding individual modules
In addition, one of the aims in robotics is to build smarter robots that can communicate, collaborate and operate more naturally and safely. Increasing a robot’s knowledge and intelligence is a vital for the successful implementation of Industry 4.0, since traditional approaches are not flexible enough to respond to the rapidly changing demands of new production processes and their growing complexity. Linked data represents a promising approach to overcome limitations of the state-of the- art solutions. The following list of topics is indicative: 

  • Knowledge Representation for Robotics 
  • Data integration 
  • Motion and task planning
  • Manipulation and grasping
  • Object and place recognition
  • Human-Robot and Robot-Robot Interaction
  • Navigation
  • Databases for robotics applications
  • Multidisciplinary Topics 

China’s next Generation Artificial Intelligence Development Plan

posted Mar 19, 2018, 2:36 AM by Enrico Fagnoni   [ updated Mar 19, 2018, 2:44 AM ]

The rapid development of artificial intelligence (AI) will profoundly change human society and life and change the world. To seize the major strategic opportunity for the development of AI, to build China’s first-mover advantage in the development of AI, to accelerate the construction of an innovative nation and global power in science and technology, in accordance with the requirements of the CCP Central Committee and the State Council, this plan has been formulated.

Download the translation

1-10 of 44