Technical Building Blocks
10 key ideas that define why AI works
Artificial intelligence is a highly technical field and when you first start looking into the detail it is natural to feel somewhat overwhelmed.
Even people coming into AI with a strong background in programming can feel a little uneasy.
Let’s look a the to-do list for technologists wanting to get a proper grip on the technology:
- You first have to deal with the mathematical theory that defines how neural networks operate. You could choose to skip this part and move right into how to actually use a framework like TensorFlow to build something, which is completely understandable. But I have to say that your decision on this reflects whether you are an engineer at heart or more of a technician. Both are equally important, but the first intellectual test you’ll face is right here: are you going to get stuck in, or wimp out?
- You then need to appreciate the nuances of how neural networks are trained (which is what the term “machine leaning” refers to) and how to set a network up before it is trained by choosing the right set of hyperparameters.
- Then there’s the question of how neural networks are realised using software models and how those software models are executed by a range of hardware platforms.
- The AI field also has its fair share of architectural approaches, like deep neural nets, CNNs, RNNs, GANs and many more – plus there’s a plethora of AI-specific acronyms to get to grips with.
- AI also comes with its own special set of tools: designers have easy access to a range of powerful online AI development environments like TensorFlow, Cafee and Keras while the large cloud players all offer services such as Azure Machine Learning and Amazon Machine Learning that you to design, train and test AI models and then deploy them into a production environment, although this can be a very expensive way to go.
- There are also an increasing number of high functionality APIs, such as Google Cloud Natural Language and Amazon Rekognition Video that can be used to quickly inject powerful AI functionality into software projects.
The challenge in this Module is to simplify all of this complexity down into 10 simple ideas that represent technical building blocks of AI that allow the technology to do what it does.
In other words, every single AI system in production today, including those at the very leading edge – like Google/DeepMind AlphaGo and IBM Project Debater – use some of all of these technical building blocks, and nothing else.
So once you clearly understand these key ideas – the technical building blocks of AI – then the task of understanding how a given AI system works will be a lot easier.
There are many good reasons for boiling all the technical detail to the constituent ideas used in all AI systems, but one of the most important is that it will allow you to clearly see how limited these ideas still are and also where there are clear possibilities to improve by defining new ideas that could make AI systems even more powerful than they already are.
We will focus more on what things do, and less on how they are realised
The approach we’ll be taking in this Module is to firstly segment the AI field into the various application classes (e.g. image recognition, natural language processing, personal assistants, chatbots, speech recognition, language translation, robotics etc.).
Next, for each application classes, we will identify the main AI architectural ideas used to build real AI systems that address problems in that class (e.g. neural networks, training, convolution, adversarial, recursion etc.).
When done, we will have identified a list of the main technical building blocks used in real AI systems which we’ll resolve into 10 core ideas, or technical building blocks.
The final stage will be to clearly explain the intuitive meaning of each of these building blocks.
To understand what I mean by this, let’s look at a practical example by comparing two different ways to explain a neural network:
- Explanation – Type 1: A “neural network” consists of a set of neural units – nodes – that are connected together in a multi-layered structure where each layer (L) contains a sub-set of nodes and each node in that layer L is connected to each of the nodes in two additional layers (L-1) and (L+1) which are positioned on either side of layer L and each contain additional sub-sets of nodes. The nodes in layer L-1 provide the inputs to the nodes in layer L while the nodes in layer L+1 accept their input from the nodes in layer L. The input to the entire structure is applied to nodes in the first, or input layer while the output appears at the nodes in the last, or output layer.
- Explanation – Type 2: A”neural network” is a way of discovering a particular mathematical function that can approximately recognize the presence of the average of a set of already-known patterns (e.g. gorillas) that exist within a set of data structures (e.g. image files). The neural network thus has the ability to perform complex, multi-dimensional interpolation in order to recognize whether a given data structure might contain a pattern that is similar to the average of the patterns it was trained to recognize.
I should say that neither of these explanations are totally rigorous but they’re not meant to be – they just need to be good enough to get the basic ideas across.
You will see that there is a big difference between these two explanations:
The first, Type 1, explanation says what a neural network is, but not what it does.
The second, Type 2, explanation says what a neural network does, but not what it is.
In this Module we are interested in conveying the key ideas that lie at the heart of AI technology by using an Type 2 explanations.
And we’re then going to discuss these ideas in detail in plain English and use plenty of examples and analogies to really crystalize each idea.
Is this realistic?
Yes, even complex ideas can be explained in plain English
One of the most important revelations I had some years ago was that although a given topic in science can seem impenetrably complex to a mind not trained in formal science (like mine!), the art is in seeing and understanding the key idea/s that are being conveyed.
Although essential for scientists, the tedious mathematics and pedantic language found in a typical research paper is like a facade that hides a few simple, constituent ideas that anyone can understand (at least for the sorts of topics we’re concerned with here).
The trick is being able to identify and understand the ideas and not obsess to much about the detail, at least not on a first pass.
Really great scientists have a gift of being able to explain their ideas to others simply and clearly.
As an aside I would say that one of the problems in physics today is that too many physicists are spending too much time discovering obscure mathematical constructs and then trying to pontificate about what those constructs imply about reality.
But I think this represents “back-to-front physics.”
I would say that too much theoretical physics today involves Type 1 thinking (above), and that is certainly true of AI.
The really great conceptual physicists, like Einstein, thought deeply in order to to discover a set of ideas (assumptions) about some aspect of reality and they then tried to connect those ideas together in a rich conceptual narrative (using thought experiments, logical arguments and analogies that have meaning in the real world) which then gave them confidence that the resulting intellectual construct correlated well with actual reality.
Then they started with the mathematics, which is really another way of articulating the ideas they had already discovered and interpolating between them.
People like Einstein focused on Type 2 thinking, and that’s the approach we’re going to using here.
Example 1: Software
To give you confidence that this is possible remember that a computer program is just another way to represent a set of ideas.
Even the most complex computer programmes are ultimately built using a set of rules, assumptions and decisions which are individually very simple and can each be written down in plain English.
You can think of a software programme as a flowchart. In fact, any computer program could quite literally be represented by a flowchart – maybe a pretty complex flowchart people would immediately recognize the result as a flowchart.
Anyone can understand a flowchart but the equivalent programme is harder because you need to understand the language used to encode that flowchart.
This means that you can understand any software idea as well as any programmer.
When you look at real code you might recoil in horror thinking “I could never understand that.” But when someone explains the purpose of a section of code then you will be surprised by how simple it really is.
Don’t be afraid of software or code (see Module 3 for a detailed discussion of the differences between software, code and a program).
Another way to see that even the most complex software is ultimately composed of simple ideas is to realise that when computer scientists or AI researchers have their ‘bright ideas’ they have those ideas in English, not in Python or Scala.
There are no really complex ideas in software, at least not within the computational paradigm that we currently live in.
Example 2: Mathematics
And as far as the math goes, then it’s the same: a computer programme is quite literally a very particular type of mathematical structure (logic being a mathematical discipline) so we can also explain even advanced mathematical ideas in English as well.
So in this Module we are going to effectively convert advanced AI code and math back into English and then summarise the resulting volume of text down into 10 common, underlying ideas which we will then explain and discuss.
This will then provide us with another layer of thinking that will sit on top of that which we developed in Module 3.
I’d like to explain why Module 4 is so important…
AI is shifting from math and code to ideas and experimentation
Over the last few years AI has moved into an phase where experimenting is now as important, if not more important than mathematics and theory.
This shift has taken place because the sophistication of neural networks today means that there are so many architectural decisions to make – before you even begin to train the network – that you are forced to make a somewhat arbitrary set of decisions (i.e. decide on a set of hyperparameters) and see what happens. And then you need to be prepared for failure and start again.
So powerful and so easy to use are cloud-based AI/ML development tools that it is easier to test an idea by building something or making some changes and then seeing what happens – than it is to sit down and try to predict what will happen using math.
There are three reasons why people think first of experimenting and last of trying to think things through using math:
- I don’t have the time: People working in real companies who are under commercial pressure to develop and launch an actual product simply do not have the time to spend days or weeks on a pointless theoretical crusade, when modern tools have features that can be used to build a new AI model in a matter of a few hours and just see what happens.
- I don’t have the math skills: Many (but not all) AI developers working at the coal face simply do not have the requisite math skills to know how to do this anyway. This is not a criticism at all or an implication that they are not smart enough – it’s just a natural implication of them having not yet invested the time needed to understand the language of mathematics.
- It’s too hard anyway: We are now at the point where the behaviour of complex AI systems is too hard analyse mathematically – by anyone. Here’s an example: when considering changing the activation function in a given layer from, say, “Tanh” to “Leaky ReLU” (a technical glossary will be provided with this section so all these acronyms will be explained), you might have some intuitive feel for how that change might affect performance, but you really don’t know for sure. It’s quite likely that you will get a surprise – an outcome that was not foreseen and and you’ll probably not understand the reason. Worse, the reason might, in fact, not be possible to understand – by anyone. The math is just too complex and the relationship between the change and the semantic details of the problem is simply unknown.
A few words about ‘black arts’
A field that I know a fair bit about, RF and microwave circuit design, has long been perceived to be something a ‘black art’.
RF and microwave engineers have to accept that after the calculations and modelling are done, there will remain some residual level of fiddling, a need to apply a rule-of-thumb or just some plain bodging.
The reason for this messiness is that what is portrayed on the paper circuit diagram is not the actual physical circuit. This could be because the RF energy wants to escape from the physical structure that is the circuit and then interfere with what other, nearby circuits are doing. Or it might be because the signal is changing so quickly that the time needed for those changes in information to travel just a few mm (at about the speed of light) needs to be taken into account. Suddenly, innocuous-looking tracks on a circuit board become electronic components in their own right and play a critical role on how the circuit works.
Some RF circuits can be extremely sensitive – if you wave you hand above a working circuit the performance can change wildly. Or if you disconnect the input and connect it again then the performance change (probably a slightly dirty connector). Scary stuff.
This complexity – the ‘black art’ bit – arises because it’s not practical (or even possible) to model the reality of an actual circuit situation perfectly.
AI is also developing a reputation as something of a ‘black art’ where the best designers rely as much on experience, rule of thumb and an understanding of what worked in the past as they do on theory, data and fact.
But AI is far more complicated than RF and microwave circuit design.
And I would argue that it’s a lot blacker than RF and microwave circuit design.
With RF and microwave we understand the underlying physical mechanisms that define how a given circuit behaves (e.g. Ohms law, Maxwell’s equations, inductive mechanisms, transmission lines, electrical fields etc.). The actual circuit (meaning the physical realization of the circuit, as opposed to what is on a circuit diagram) is too complex to model perfectly, but we can get close and we have full visibility of the underlying mechanisms.
RF is complex, but not really mysterious.
But AI is another level of complexity completely.
With AI, we do not even know what the underlining mechanisms are: there are presently no theories at all that link certain design decisions with the informational content or semantic meaning of the data structures we are analyzing. We’re kidding ourselves if we believe we understand this – because we simply don’t.
With the RF amplifier above, I know that if I run the power track too close to the output track then some of the RF energy will be coupled onto that track and could find a way back to the input of the amplifier, causing it to oscillate. I can’t easily calculate the exact magnitude of this effect but I know that this could be a problem so I have to be careful when laying out the circuit board.
If I really wanted to then I could build a test circuit to measure this effect for a given circuit layout, but most practicing engineers don’t have time to bother with this.
The equivalent in a deep neural network would be knowing that, for example, if too many nodes in a given layer have a Sigmoid activation function then that will lead to the network becoming unstable during training and failing to converge.
With the RF circuit above we understand the electrical mechanism that will lead to instability, but with the neural network we do not understand the mathematical mechanism that will lead to this undesired, unstable behaviour during training.
Along with some mathematicians I’ve spoken to, I personally have a big problem with this, and we’ll get into the detail of why in this Module, but one practical concern is that very complex, high-performance AI systems are capable of recognizing patterns that we have inadvertently trained them to recognize – even though those patterns are invisible to us.
The fact that we do not know the full set of false patterns that a given AI can mistake for real patterns means that many leading AI systems contain serious bugs which might manifest as a gross mis-classification errors in certain particular situations that could conceivably be exploited by nefarious actors.
It is easy to imagine application examples where this could have very serious consequences.
So why is any of this significant for you?
It’s significant for you because the very advanced math, software and technology that defines the AI scene is not as honed and well-understood as you might think.
You should be respectful and gracious, but don’t be fooled: what’s under the hood is still very basic.
AI is a very new, fast-developing field and there is a long way to go with plenty of opportunity for people who deeply understand the technology and business context to help shape the field.
This means that you could well have architectural ideas – which means an idea of how to better solve an AI problem – that are just as valid and useful as the engineers working on an AI development project.
So I guess this is a bit of a confidence boost – but you have to understand the basics first, which is what this Module is all about.