News archive‎ > ‎

Machine Learning: oh yeah?

posted May 17, 2019, 1:51 AM by Enrico Fagnoni   [ updated May 17, 2019, 1:54 AM ]
Recently, someone starts to speak about the training of AI models while maintaining learning data privacy (e.g. "A Demonstration of Sterling: A Privacy-Preserving Data Marketplace" Nick Hynes, David Dao, David Yan, Raymond Cheng, Dawn Song VLDB Demo 2018. )

In my opinion, the ML model creation makes sense only on public data sets or, in any case, on data verifiable by those who will use the model obtained from them. Otherwise, the results are indistinguishable from random values that should be revalidated experimentally on each update of the model.

Using a "black box" model is a total act of faith. We fall into the "reputational" scheme where the crowd decides what is right and what is wrong without having the elements to do so.  At the very least, those who produce an opaque model should be responsible for the errors produced by the model they created.

The risk of switching from fake news to fake data and/or fake models is very high. 

We need the "Oh yeah?" button

In this regard, I invite you to read this passage taken from an article in 1997 by Sir Tim Berner Lee that I copy here for brevity:

Deeds are ways we tell the computer, the system, other people, the Web, to trust something. How does the Web tell us?

It can happen in lots of ways but again it needs a clear user interface. It's no good for one's computer to be aware of the lack of security about a document if the user can ignore it. But then, most of the time as user I want to concentrate on the content not on the metadata: so I don't want the security to be too intrusive. The machine can check back the reasons why it might trust a document automatically or when asked. Here is just one way I could accept it.

At the toolbar (menu, whatever) associated with a document there is a button marked "Oh, yeah?". You press it when you lose that feeling of trust. It says to the Web, "so how do I know I can trust this information?". The software then goes directly or indirectly back to metainformation about the document, which suggests a number of reasons. These are like incomplete logical proofs. One might say,

"This offer for sale is signed with a key mentioned in a list of keys (linked) which asserts that tthe Internet Consumers Association endoses it as reputable for consumer trade in 1997 for transactions up to up to $5000. The list is signed with key (value) which you may trust as an authority for such statements."

Your computer fetches the list and verifies the signature because it has found in a personal statement that you trust the given key as being valid for such statements. That is, you have said, or whoever your trusted to set up your profile said,

"Key (value) is good for verification of any statement of the form `the Internet Consumers Association endorses page(p) as reputable for consumer trade in 1997 for transactions up to up to $5000. '"

 and you have also said that "I trust for purchases up to $3000 any page(p) for which `the Internet Consumers Association endorses page(p) as reputable for consumer trade in 1997 for transactions up to up to $5000."

The result of pressing on the "Oh, yeah?" button is either a list of assumptions on which the trust is based, or of course an error message indicating either that a signature has failed, or that the system couldn't find a path of trust from you to the page.

Notice that to do this, we do not need a system which can derive a proof or disproof of any arbitrary logical assertion. The client will be helped by the server, in that the server will have an incentive to send a suggested proof or set of possible proof paths. Therefore it won't be necessary for the client to search all over the web for the path.

The "Oh, yeah?" button is in fact the realively easy bit of human interface. Allowing the user to make statements above and understand them is much more difficult. About as difficult as programming a VCR clock: too difficult. So I imagine that the subset of the logic language which is offered to most users will be simple: certainly not Turing complete!

The hype of ML and of AI must not let us forget that some problems, and the solutions, are even older than the internet.