Intro to LLMs and Generative AI

In my first post, I mentioned the emergence of generative AI - ChatGPTBard and friends. But what are these mysterious black boxes that can produce creative texts and are seemingly indistinguishable from human intelligence? 

Let's break it down. This is the first in a series of three posts explaining how these models work:

  1. The basics of Artificial Intelligence and Machine Learning (this post)
  2. Deep Learning and Neural Networks
  3. Large Language Models

What's Artificial Intelligence and Machine Learning?

Both terms are hot buzzwords these days, and there are many definitions for both. One of the founding fathers of computer science, the great Alan Turing, simply defines Artificial Intelligence (AI) as 
the science and engineering of making intelligent machines

Well, that didn't help! But that's really what it is: Making machines act intelligently. In other words, make them act like humans. AI is umbrella term for different fields and techniques pursuing this goal in many different ways. I like IBM's definition, which makes it a bit more tangible:

artificial intelligence is a field, which combines computer science and robust datasets, to enable problem-solving

This definition emphasizes an important piece: data. Which brings us to Machine Learning (ML), which is a branch of AI that focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy (definition).

What this practically means is that we take a dataset of known samples (or facts) and feed it into an algorithm that creates a model representing these facts. Such models are abstractions and generalizations of reality, which can be used to explain further examples - we call these predictions. Let's look at an example for supervised learning, a sub-category of machine learning where each example is a combination of a measurable truths (called features) and a label.

Imagine a set of images of animals, each labeled with the name of the animal it contains. 

We feed a lot of those images (and I mean truly a lot, at least thousands or better millions) into our ML algorithm, and it figures out how the relationship between each image and its label can be described. This is a fuzzy process - it doesn't simply store every image of a dog, but rather creates a model (an abstraction) of how a dog looks like. In other words, it doesn't need to know every single dog in the world to decide whether an animal is a dog.

Once our model has been trained on our dataset, we can feed it new samples and it uses its knowledge to predict their labels. In other words: animal images in and their names out.

Machine Learning Models

So what are these mysterious abstractions we call models? Some of them are actual pretty simple, as I'm going to show with a super simple dataset. Two simple weather-based features (outlook and wind) explain here when a particular person might want to play golf (the label).

Outlook Windy Play Golf?
Sunny True Yes
Overcast False Yes
Overcast True No

Rule Sets

One of the most natural ways for us humans to explain this type of data are sets of IF ... THEN ... rules. The following three rules fully explain our dataset.

Play Golf?
  IF Outlook = SUNNY THEN Yes
  IF Outlook = Overcast AND Windy = False THEN Yes
  IF Outlook = Overcast AND Windy = True THEN No

This doesn't really look like rocket science - although how to automatically generate these rules from generic data is indeed complex and we won't cover it here. The point is, these rules are in fact a more abstract representation of our data - our dataset is quite small, but you can potentially explain thousands of similar rows of data with it. Moreover, it can be used to predict labels for data that we haven't seen before. 

Outlook Windy Play Golf?
Sunny True Yes

Whether accurate or not, this is an actual prediction - a computer just derived new information out of thin air, and it might even be wrong (what, machines can lie?). This is very different from the behavior of a machine seen as a stereotypical calculator, which does just basic math, albeit much faster than us. This is the essence of AI, a machine doing something that looks to us as intelligent.

Decision Trees & Decision Forests

Another way of representing our model is as a decision tree.


If we want to predict a label, we simply follow the branches from the root (at the top - yes that's confusing) that represent our input data. The leaf we end up with (the green node at the bottom) is our predicted label.

Decision trees are just another representation of rule sets - you can convert every path from the root to a leaf to a IF ... THEN ... rule. The nice thing about them is they're visual. And that's an important point: there are models we can easily comprehend as humans and others that look more like a black box to us (neural networks are like that as we'll see in our next post).

While decision trees can be nicely visualized, they don't generalize well - keep in mind that in reality, the training dataset has probably hundreds of columns and millions of rows. Decision Forests are multiple trees, each trained on a slight variation of the underlying training data. The overall prediction is simply an aggregation of the predictions of each individual tree. Decision forests are more robust than single trees and were all the hype in the 2000s.

Artificial Neural Networks

Artificial Neural Networks (or just neural networks) are modelled after the human brain and are, perhaps for that reason, the most powerful models today. I'll cover these in more detail in the next post in this series.

Other Techniques

There are many other models, and they all differ on how they represent their knowledge, how complex the concepts are they can represent, how well they abstract, and whether we can easily understand them as humans. Linear regression finds the best linear formula (something like x = a * y + b * z + c) to predict numeric labels, Bayesian classifiers use probabilities to predict how likely each possible label is, and Support Vector Machines try to find hyperplanes (lots of maths here) to find the best separation of labels. And there are many more approaches. There are even techniques to combine several classifiers (like decision forests above).

Conclusion

This quick overview hopefully shed some light on AI and ML and demystified these buzzwords. There's nothing magical about what's happening here. Most models aren't even overly complicated and can be easily understood with a bit of background in math. I'll dive a deeper into neural networks in my next post, since they're the underlying models powering ChatGPT, Bard and all the other large language models out there.

One final thing: Most models (and the ML algorithms creating them) are exhaustively researched and can be used out of the box in many programming languages. I'd go as far as saying "creating a model" is mostly a solved problem. What distinguishes a good from a bad model then? It's what feeds them all - the training data. Getting the data right is a huge task, and as much art as it is science. That's where we humans can still make a big difference.

Comments

Popular posts from this blog

Writing A Book

And it's out!