Large Language Models
This is the second post in a series from the basics of machine learning to state of the art large language models ( ChatGPT , Bard and friends). Here the links to the entire series: The basics of Artificial Intelligence and Machine Learning Deep Learning and Neural Networks Large Language Models (this post) The Transformer Architecture As explained in my previous post , neural networks are an ML model designed after the blueprint of our brain, capable of representing complex relationships and hence deep knowledge. The structure of such a neural network - how the artificial neurons are connected, or in mathematical terms the layout of the network graph - is what we call its architecture . Over the last decade or so, ML researchers have found better and better architectures for a number of different tasks, such as computer vision or language understanding. The analogy in real life us how the different parts of our own brain are wired to perform specific parts like vision, memory or ot