In this episode, we explore some of the opportunities and complexities you’ll encounter training a large language model. We examine what happens, at a technical level, when you train and why training can be a great improvement in terms of cost and efficiency for a lot of problems you’d want to solve with generative AI.
A real-life example of how we cut training times in half for a large, AI SaaS - https://www.missioncloud.com/case-studies/how-leading-ai-saas-lowered-training-costs-for-its-llm-and-reduced-loading-times-with-aws
Simon Willison’s article on Embeddings - https://simonwillison.net/2023/Oct/23/embeddings/
AWS’s blog on using FSx and p4d instances to train large language models - https://aws.amazon.com/blogs/storage/accelerating-gpt-large-language-model-training-with-aws-services/
Latent Space’s podcast on fine-tuning Llama-34B to beat GPT-4 in coding - https://www.latent.space/p/phind#details
AWS’s blog on fine-tuning and deploying Mistral-7B with SageMaker Jumpstart - https://aws.amazon.com/blogs/machine-learning/fine-tune-and-deploy-mistral-7b-with-amazon-sagemaker-jumpstart/
Welcome to Mission Generate, the podcast where we explore the power of AWS and the cutting-edge techniques we use with real customers to build generative AI solutions for them. Join us each episode as we dig into the technical details, cut through the hype, and uncover the business possibilities.
I'm Ryan Ries, our generative AI practice lead and your host.
In today's episode, we're going to talk about training and fine-tuning your model.
We're going to cover a lot of technical concepts in this episode, a bit more than we usually do, but we can summarize this topic quite easily—when you want a large language model to do something specific, often the most efficient way to scale a solution is to train the model explicitly on the data you need it to understand and the ways you intend for it to respond.
Training at this point is still as much art as it is science, but we will break down how it works and why it matters. Let's begin.
Let's start with the disclaimer as usual. This is not the "actual" Casey and Ryan having a conversation. Our voices are being synthesized by generative AI to speak to you, a feat that we thought was an interesting way to explore building a podcast on the same subject. And now, with that out of the way, let's get back to our topic.
Ryan, I gotta level with you here. Before I started researching for this episode, this confused the heck out of me. Training, fine-tuning, embedding. It often seemed like we were using different words to describe the exact same thing. Before we get too deep, can we just start with the differences between those, because a lot of the literature seems to use them interchangeably. And that word, "training," it just feels so big. Can you break this down?
I appreciate you coming clean, Casey.
Look I'm going to be blunt—if you work with us, you'll find I'm pretty good at doing that by the way. So, I've been in AI and machine learning for over twenty years. Yes, I may be dating myself here, but it's true.
So you're saying you were doing AI before it was cool?
That's exactly what I'm saying. I'm an AI hipster. there, I admitted it.
But yeah, I've been in this field for twenty years. It used to be almost purely academic at one point. And a lot of the field still advances at the pace of academic research. Which tends to take words you thought you knew and change them to mean new things in this context. For a layman or an outsider, this can be hard to follow.
So let's define terms like you suggested, and help people wrap their heads around these words. Let's start with embedding.
Embedding is the process of taking data—in most cases, textual data—and turning it into vectors. Vectors are like a series of coordinates in space. You can think of vectors as being a bunch of points in a cloud—not the AWS kind of cloud, just a cloud cloud—and the nearer those points are together, the closer in relation that vectorized text is.
I feel like we should just jump right to the mind bending part of this—that metaphor, points floating in space, makes a lot of sense in my mind's eye. But then I was reading about the number of dimensions, or coordinates, within each piece of data for embedding. And it can be like 500, or even a thousand. A thousand-dimensional vector. Is that right?
Well, yeah, the numbers get pretty crazy pretty fast.
Right, so it's not just like points in a cloud. It's like some points could be furry, and others could be bald. That would be one additional dimension. And then another could be what color they are. And then another could be how soft or spiky they are, or something like that.
And I know I'm probably sounding crazy at this point, but we're only up to like six dimensions out of five hundred with this metaphor.
Pretty trippy, Casey. I like it.
But yeah, embedding is that process of taking textual data and transforming it into this complex, multi-dimensional set of numbers. And basically, each of the numbers in each vector communicates some kind of relational status according to the proximity of their coordinates. In one sense, you could say that the nearer together they are, the more closely they relate.
Some people may find this off-putting, but we actually still don't know exactly what's being encoded with this process of embedding. But this is the basic math behind forming a large language model.
Yeah, this is where I really started to lose my mind. I was reading a phenomenal blog by Simon Williams,* which we'll put in the show notes, and he gives an example of what you can do with these vectors mathematically. He puts it this way:
Take the vector for “germany”, add “paris” and subtract “france”. The resulting vector is closest to “berlin”!
Ryan, honestly, what the heck, dude?
* I should have said Simon Willison here! Apologies for the name goof, Simon.
It is pretty crazy, right!?
So, to sum up embeddings, we might say it's the process of turning language into math. Sticklers will have a more precise definition than that, but that's enough to get us started.
So that's embeddings. If you've heard about it when people talk about generative AI, it's because this vectorization process undergirds the whole technology. The relations between these vectors are really what a large language model is parsing when trying to respond to your query.
It's taking that incredibly complex multi-dimensional number that your string of words represents and trying to figure out what follows logically.
This means, you need embedding just for LLMs to exist at all. And, as it turns out, if you're going to be doing any kind of training of a model, you're going to do so with embeddings. You need to get that data into a representational form that allows the model to capture its semantic relationships.
Which takes us to training.
Think of training as how a model understands the patterns and relationships denoted by your vectors. You do this by adjusting a model's parameters, which are partially that cloud of vectors itself—the initial data you feed it—and the underlying model, which in the case of generative AI is transformers. That's probably a bit out of scope for this talk.
But let's put it this way, to train a model to do anything in the first place, you're using embeddings to represent your data, and your parameters affect how it ingests and derives relationships or meaning from that data.
Okay, so let's come back down to planet earth for a second here, Ryan.
Why should anyone care about training? So far, this all sounds like stuff that happens behind the curtain. I mean, we bothered to create a podcast episode about it, so it must be important, right?
What's the big deal?
Yeah, training is a big topic, but if you're thinking about building something with a generative AI model, here's why training matters to you:
All that math I just described is massively expensive to do. When you hear about models scraping the entire internet, this is what they're talking about. These model makers are trying to create numeric data sets out of the written words of every human on the planet. And programs like those aren't cheap to run, as you might imagine.
When people think about generative AI, most of them are only going to have experience with a general-purpose model, like Chat GPT.
And if you were trying to build a solution that could do anything and everything, then yeah, taking the entire internet and feeding it to your massive network of servers, like they did, would probably be the right starting place.
But for 99% of us, that's not what we're trying to do at all! Most of my customers are trying to solve a problem, a specific problem. And as wonderful as these general purpose models are, you probably don't need all of that power for your problem.
Which is my polite way of saying, a lot of folks are overpaying for the problems they want to solve. This can really sneak up on you.
Let's say you're working with a model and spending just pennies at a time, so you may not even realize that by the time you take that and turn it into a full-scale solution, you're starting to spend thousands and thousands of dollars a day. Your way of solving made sense when you were small and experimenting. But does it make sense at scale? You really need to check your math on this.
Yeah, so what you're saying is that, without training, you have to buy the whole Swiss army knife, even if the only thing you need is the can opener.
Exactly. Let's sketch an example of this. If you were trying to build an AI agent that could help you research and forecast about the movements of financial markets, would you consider it important that it be able to explain to you French literature, dog-walking techniques, or the lifecycle of common mushrooms?
In a general-purpose model, all of that is in there, somewhere. Now, it's true, you need some level of generalization to be able to understand complex natural language questions. But if you were to fine-tune your model, you may be able to run something a thousand times smaller than GPT, and get that AI financial analyst producing compelling answers.
Fine-tuning is training, but a specific kind—it's the process of specializing a model on a specific data set and to a specific end. It's about making a model a specialist in the domain of your application instead of a generalist in everything. Once again, you're using embeddings to do this. So they're still in the mix. But now, you're trying to bias those parameters heavily toward whatever you intend to accomplish.
What should excite folks about fine-tuning is that this means you can work with much smaller models, models that could be run on a single computer, instead of a huge network of servers, and still get compelling results. So fine-tuning can let you side-step a lot of scaling issues: cost, latency, data privacy, accuracy. Think about how much easier all of these become if you're running your own model and teaching it to work with just the data you care about and nothing else.
One thing we probably should clarify at this stage is that fine-tuning itself isn't free, either. Getting that data into a model and refining it is going to be a process of multiple iterations of heavy compute time. There's no free lunch here. But it does seem like there's an interesting tradeoff—if you can invest the time and compute up front to specialize a model, you're going to be repaid with a lower cost of ownership, and a solution which can scale more efficiently.
So in the long run, this approach is certainly more profitable. But it also means you really have to care about the infrastructure that trains your model and using it in an efficient way.
Yes, I think that's right, Casey. Like many things in this space, there's no single right answer to building a solution. But there are opportunities. If you know your solution is going to be hit with thousands of prompts an hour, that the context you'll need makes API pricing painful, or it really needs to live inside a single instance for architectural reasons—these are all situations where you should seriously consider fine-tuning.
So let's sum up here. Embedding is the process of taking data and turning it into something that a machine learning model can work with. Training is feeding that model all of that data so that it can become a generative AI—a large language model. But fine-tuning is where the magic happens. That's where you take a model, which may have been trained in any number of ways, but you begin refining it to work just on your problem domain. This is how you can get the performance and intelligence of a much larger model without having to pay the associated costs in complexity and infrastructure.
Well said, Casey. As we end here, I'd like to pitch to the builders for a second.
Hopefully we broke down these concepts in a way that was easy to understand and think about. But with that said, we know data architecture is hard. And that's what it takes to do real fine-tuning of a model.
If you've been working on something with a generative AI, but the economics don't make sense yet, you still have an opportunity here. And if you'd like to talk to someone about how to capitalize on that, by fine-tuning a model for instance, reach out to us. We'll give you an hour of time, free of charge, just to learn about your solution and help you think through how to architect for training and specialization.
We love taking difficult problems and making them simple, and we can help you calculate your Total Cost of Ownership ahead of any endeavor, so you can know exactly what you'll be paying when you reach production. Check us out at mission cloud dot com to see how we've done exactly that for hundreds of different customers of AWS.
Good luck out there, and happy building!