Mission AI21 Labs

Welcome to a special guest episode of Mission: Generate, where we dive into AWS's role in developing state-of-the-art generative AI solutions with real-world applications. Dr. Ryan Ries and Casey Samulski are joined by Dan Padnos, VP of Engineering at AI 21 Labs, to discuss their advancements with Jurassic-2 and other large language models, addressing challenges like hallucinations and the intricacies of engineering production-grade AI systems. Tune in for an engaging dialogue between real humans this time, exploring the frontiers of AI technology and its potential to revolutionize industries.

Show Notes:

Understanding guardrails: https://towardsdatascience.com/safeguarding-llms-with-guardrails-4f5d9f57cff2
Ways people are using large language models in the wild: https://news.ycombinator.com/item?id=38742311

Official Transcript:

Casey
Welcome to Mission Generate, the podcast where we explore the power of AWS and the cutting edge techniques we use with real customers to build generative AI solutions for them. Join us each episode as we dig into the technical details, cut through the hype, and uncover the business possibilities. Now, I am not Ryan Ries. As you may have noticed, but our co-host and senior Product Marketing manager, Casey Samulski. That's because today we have a special guest episode where we'll be interviewing Dan Padnos, VP of engineering and AI 21 labs, to talk about Jurassic two and the other large language models they've developed. We'll be discussing hallucinations, the problem with engineering production grade solutions, and the future of AI. It's going to be a pretty cool episode. As you may have guessed, it's going to be real humans talking just like my voice right now. What a nice treat for us. Who knows, maybe we'll get another generated episode out. One of these days, but let's start the conversation. Dan, we're super excited to have you join us.

Casey
Thank you so much for coming on the podcast. We're really excited to have A21 labs on today. And we'll just start with introductions. I'm Casey Samulski, I lead product marketing mission, and I co-host the podcast with Doctor Ryan Ries who I will let introduce himself.

Ryan
Hey everyone, Ryan Ries here excited for another edition of the Mission Cloud Generate podcast. Another real life edition coming again since Reinvent. As you guys know, I lead Data Science strategy here at mission and excited to talk with our friends at AI 21. Dan, if you want to give an introduction of yourself. Sure.

Dan
Hi, Casey. Hi, Ryan. And, hello to everyone listening. I'm, Dan, I lead, the platform unit in, AI 21 labs. I guess we'll talk about that. More in a few minutes. And, I'm excited to be here. Thanks for for having me on the podcast.

Casey
Yeah. Thanks so much for being here. Dan, can you tell us a little bit more about what you do at AI 21 labs, and also just more generally, what the company is about and what your mission is?

Dan
Sure. So, so why don't I start with the company? AI21 is a startup. We're 200 some people, mostly R&D, mostly based in Tel Aviv. With, a focus on natural language processing, natural language generation. And nowadays, this means everything to do with large language models. The company started out as a sort of dual, research shop and, or research lab and a product company. Our first products were actually in the B2C space. We launched Wharton as, probably the first real massive generative AI, application. It's a writing assistant. That helps you, find the best phrasing for what you want to say. With a bunch of tools that that help you write more productively. And that was our first product. But, from the beginning, our goal was to become really a platform company that builds foundational technology that others can build on.

Dan
And we built word tune on top of our own, language models. So two and a half years ago, we, launched, our platform offering based on our large language model. at the time, it was Jurassic one. Now we're at, Jurassic two. And it was, you you could say basically the first, open, publicly open API, for a language model of the scale of GPT three. GPT three at that time was, a closed sort of invite only party. And we came out with Jurassic, 178 billion parameter model, that was open in general availability. And yeah, sort of the story took off from there. Nowadays on our platform offering, you can find, general purpose large language models that follow instructions and handle a large variety of tasks in different languages, as well as what we call task specific models that focus on, specific tasks and really optimize performance, cost, reliability, etc.

Dan
For those specific tasks.

Casey
Since you mentioned this, I do think one of the things our audience is aware of is that there's differences between models. But I think if you're not deeply intimate with the space and the technology, it's a little bit hard to understand how different models really are from one another. Apart from maybe being a little bit more insightful or accurate in their answers. So can you talk a bit about what the actual differences are between a given language model and why? You might want to choose one model, like a model from your company versus one of the other model makers?

Dan
Yeah, absolutely. I'll start with a few general statements. So the first one is that figuring out what a given language model is good at and essentially evaluating it is a pretty challenging and, delicate task, especially if what you're after is a capability that involves, generating, even short texts, but especially long texts. Because these are, you know, consider, a summary, even if it's a relatively short summary of a relatively short document. There are so many dimensions, by which you may judge the summary to be good or bad. And it's not always easy to, you know, when you see a summary that came out of a given model to say, this is exactly what I need, or this is off in some way. And that's even before we go into some of the more delicate stuff that really involves, a lot of, of, sweat and tears, in figuring out such as, is this model hallucinating?

Dan
Is it making subtle factual errors? And what are they? How likely are they how does that change as a function of the input? So I think if people are feeling a little bit confused and overwhelmed and uncertain as to what they should go with, I can definitely understand why that is. It's it's, a byproduct of the the nature of the tasks that these models are built to handle and kind of how complex they are. So that's one general statement. The other general statement is that probably, models behave different, differently on, on even the same task. So models from different providers or different open source models will give different outputs and kind of exhibit different behaviors. And there's. Is no substitute to actually trying them out on your tasks and seeing if they fit your specific requirements right. Because cases requirements may may be different from mine. Even if we both call our task summarization because Casey is looking at a business use case in finance, and I'm looking at something in retail, and those look different and we have different expectations about the output.

Dan
So that's general statement, number two.

Ryan
So I'm going to interrupt you for a minute, Dan. So I think what you're getting at, and it might be your third statement, is something that excites us about AI 21, where you guys have started to focus on smaller models for specific applications, right? To kind of get past some of these issues you're bringing up. You know, that's kind of kind of the big thing that we're seeing as well, right? Is that a large language models awesome. It can do a lot of things and it can do a lot of things well. But as a, as a customer, can I afford it? Right. Is there an ROI for me there. So you know, I, I think that's you know, one of the things that excites me and where I think you were going. So maybe you can talk a little bit about the small models and specific use cases. You know, you guys at AI 21 are pushing towards.

Dan
Yeah, sure. And that's that's absolutely right. I'll say that, broadly speaking, with our technology, we are trying to focus on what matters in practice and what sort of matters in practice, especially for enterprise use cases. And that means a lot of things. But among them it means, reliability and it means affordability at scale. So that's these are two things that we prioritize in, in how we build our models. And that's true for our foundation models as well as the task specific models. So you can expect from Jurassic to to really shine on tasks that require, kind of tight grounding. So facts, tasks where you don't want to deviate from the facts given in some input context and you want to remain true to that source. Jurassic two will be generally very good at those. I think kind of where we took it to the, to the limit or to the next step is with our task specific models that try to take a specific task.

Dan
That is a recurring theme that many sort of users organizations will, likely try to take on. So one example would be, what we call contextual answers, essentially question answering given a particular grounding text. We've taken that task and we've, invested a lot of R&D into optimizing our models specifically for that task and packaging them in an API that kind of gives you, a stricter expectations on the input and the output and on the relation between the input and the output. So for our contextual answers model, you would give it a, question and a grounding context, basically a document. And what you expect to get in return is a relatively succinct, but complete factual answer. That, sticks to the facts that are given in the context. And if you ask a question that sort of goes outside the context. So suppose let's let's make it concrete with an example. Suppose the context is your shipping policy in an e-commerce site.

Dan
And the question is what is the value of your company's stock. What you would want the model to do and what we've engineered are our task specific models to do is in this case, to answer, I can't give you an answer. The answer is not in the context. So this is where the reliability comes in, where, you know, that we've built the task specific model for contextual answers to avoid hallucinating, information where, kind of a general purpose model that was tuned to make people happy in general to human preference would tend to make up stuff.

Ryan
Well, I think it also goes to, guardrails. Right. And what people are concerned about is, hey, I create a chat, a general chat bot, you know, I am, you know, whatever. I'm a bank. I created a general chat bot. And now people come in and they're just using it, you know, to to use the LM and not pay for it while I'm getting huge fees. Right. So I think it's important, yeah, to have those guardrails and have a solution that will tell them, yeah, that's that's, you know, not why you're here. Right. I think that's super valuable as well. You know, as we get going now, I would say, you know, obviously with mission we're an AWS only. Service provider, and we're super excited that I 21, is part of bedrock, right? And you guys have been for a long time on the SageMaker Jumpstart platform as well, where you have a lot of other models.

Ryan
So maybe talk for a minute just about, you know, the things going on with AWS. You know, obviously there's some exciting things people care about. Hey, is there multimodal? Right. You have a bunch of tasks. Maybe you can see multimodal coming in. Is I 21 going to jump into the space like everyone else did and maybe do some image creation? Or the third part everyone talks about right? Code generation. Right? You know what? What are your guys's thoughts on on some of those next areas? Yeah.

Dan
So, You're right. We were kind of pioneers in the, aws AI universe. We were, early on in, in the SageMaker, Jumpstart platform and, in the first wave of, of, model providers on bedrock. And we're very proud of our close partnership with AWS. I think it's working out great for AWS, for us, for the clients who use the technology through bedrock and through SageMaker. So if if I, kind of try and and provide a peek into our near-term roadmap, you'll see our task specific models becoming, more sophisticated and more, flexible, to cover additional use cases. One exciting new capability that we'll be adding to our summarization, kind of portfolio of task specific models is summarizing conversational texts. So, things that look maybe like a chat history or a transcription of a conversation like this podcast, or like a sales call or a daily meeting, whatever.

Dan
those types of tests, of texts summarizing them is very different from summarizing maybe an essay or an article or something like that. And we're, working on, on providing a good solution for those use cases, which should be coming online very soon, as well as many other tasks with different models that we have lined up.

Ryan
Yeah, I think it's really a cool use case that you just talked about on summarization of slack and other conversations. Right? We've got a couple of clients that are call centers, and yeah, we've been working on exactly that element, right where hopefully you have it broken out in the channels so that you can then work on the individual pieces and, and how they contacts together. But yeah, I mean, it takes a lot of work because now you have two channels and you lose some of the context between the two channels that that you're working on. And so, you know, I can definitely understand where that's a hard problem. So super cool that you guys are working in that space. You know, one of my last questions, you know our podcasts are usually pretty short. So let's do what are some of the predictions you're thinking about for 2024? Dan? You know, obviously last year kind of the middle of the year is when Jenny caught fire. And now you know what's going to happen in 2024.

Ryan
So I think.

Dan
I have one big prediction. The big prediction is that 24 is the year where we see, for the first time, transition from, mass experimentation to really, adoption at scale. I think you, you see it all over large organizations have some of them have hundreds of sort of use cases being piloted and developed. And obviously many of those will not end up, in production, either because they, you know, don't make it through the things that cause products to fail in general and tech products to fail in particular, or because they hit some fundamental limitations of the technology, like hallucinations or guardrails or any of those types of concerns. And I think this year we'll start seeing sort of, which use cases are sticky and useful and make sense for organizations and which don't, don't make it, out of this, infancy stage, really soon. And I think my prediction is that the results will be somewhat surprising.

Dan
Things that people were betting on to be, very successful and attractive will not turn out that way and the other way around as well.

Casey
I wanted to ask a follow up on that, just because that was an interesting observation. Then. Do you think that's because the things that look obvious is a fit for generative? I may actually, like, have more complexity than the surface level compatibility and maybe vice versa, like the things we haven't thought about using it for, or actually much better suited to the models than we realize. And could you give a concrete example if I'm on the right track there?

Dan
Yeah, I think that's part of it. So so in general, I think it has a lot to do with the harmony and the, collaboration between humans and machines. So I see a lot of, of, ideas around taking pretty complex workflows and trying to sort of, automate them with an AI end to end. And I think that in general is very challenging. I'll be delicate, but the, the, the more straightforward way to say it would be, in some cases it may be a fool's errand. There is something fundamental about these workflows that, at least where the technology is today, you need humans in the loop. You need to somehow break them down into smaller chunks and, think about which of them you're going to delegate to. Again, AI solution and which are the areas where you're still better off with humans making human judgments.

Ryan
Yeah, I think that's a good point, because I don't know how many times I've been on a sales call where a customer has massively unrealistic expectations, right? Where it's like, I want to replace my entire accounts receivable department with Chennai, right? But, you know, you hear it, right? Like people, people are definitely trying to figure out, hey, what can this do? How am I going to get, you know, cost savings and, and benefits working on this, you know, working through this. And so, you know, I do agree that, you know, there there are a lot of interesting use cases. You know, we've got upcoming a million POCs that we're about to do for various customers. And they're, you know, many, many are pretty cool use cases that are very valuable. And the ones that we shied away are exactly those where somebody has the expectation that, you know, Chennai is just going to rule the world, right? I still have that conversation with a lot of people where people just think, hey, Jen, I understands language, right?

Ryan
Like it's having a conversation with me where it's really just it's predicting a vector, right? Like, like people still don't fundamentally understand it's solving a mathematics problem, a mathematical problem that predicts a vector that returns a response. Right. So it's still it's kind of fascinating. You know, a lot of that has been talked about over and over again, but people still feel, hey, this machine understands me like it's it's able to understand my material and give me a good answer. And it's like, no, I'm taking your material and making a prediction still. So you know, it is, you know, that that education component will be interesting still over the next year.

Casey

Cool. Love it.
As we wrap up, maybe one last question for you. Both of you. What's what's one thing you were hoping we were going to talk about today that we didn't manage to get to? I always like to try to make sure we cover all the material. So anything you were hoping we were going to cover today that we haven't gotten to yet?

Dan
I think, you know, it's clear that generative AI, LMS, AI in general is, is hot and really exploding and in the best sense of the world. And it's it's very easy to get sort of overwhelmed and flooded with everything that's happening. Right. You log in to Twitter or LinkedIn and it's like every day there's a, you know, more exploding head emojis and, and, and outrageous, takes then you can possibly comprehend. So I think what I try to say about that every time I get the opportunity is don't try to keep up with the hype. There is a lot of stuff going on. It's very hard to, make sense of the signal and the noise, and it's probably worth taking a breath and realizing that if you put all of that through some low pass filter and see what survives the test of time a little bit, even with the benefit of a few weeks hindsight, I'm not talking about months or years.

Dan
You'll probably save yourself a lot of, time and confusion. The world doesn't really change every two weeks. Even though the hot takes on Twitter claim otherwise.

Ryan
Yeah. I mean, I think that's, you know, super valuable, information. I do agree it's pretty crazy. You know, there is one of your competitors that does update their model every two weeks, which amazes me the amount of training that they're, they're doing in order to maintain that. Yeah. I mean, I think the one thing that that we didn't necessarily talk about, which, you know, Bedrock and being in the AWS environment really help preserve is data privacy and security, right? Being able to run everything in your account in AWS is a super valuable tool. And with AI21 having, you know, bedrock access so you can make API calls to that helps you on the data privacy and security, which you know is a huge benefit. So if you're looking at using the various models from AI21, using them inside of AWS ecosystem is a great way to protect you on that front.

Casey
That's a great point, Ryan. Well, Dan, thank you so much for joining us. Big thanks to AI 21 labs for being willing to be our first guest episode, and we hope we'll have you on again in future seasons. Thank you so much.

Dan
Thanks, Casey. Thanks, Ryan. It's a pleasure, to be talking to you. And, I hope we'll we'll have more opportunities to speak on the podcast.

Ryan
It was good having you, Dan. All right, that was the end. Another mission cloud generate. Excited for you guys to hear this episode and then see you again next week.

Casey
Wow, what a great episode! Huge thanks to AI 21 labs and the rest of their team for making this happen. I'm really looking forward to what we're going to be able to do with your models and Amazon Bedrock in the future. For once, I get to make the pitch for Ryan's team, and since it's not him ending the episode, I get to do a little bragging. So here's something you may not know about Ryan. He's been in machine learning for over 20 years. He actually used to work for the military on satellite imaging, but don't ask him about that. He's probably sworn to secrecy. What this means is that Ryan has seen a lot of trends come and go. A lot of big expectations, false promises and cheap hype. But the reality is we've never been more bullish about generative AI than we are today. We have so many projects in flight, touching finance, healthcare, media, retail, SaaS, businesses, real estate, you name it. It's everything you can think of.

Casey
This truly is a great time to invest in a generative AI initiative, so I can't recommend Ryan and his team highly enough. They'll take an hour with you free of charge, just to sit down and understand what you want to do, and then give you great advice on how to do it free. It's nuts, but we love doing right by our customers at mission. If that sounds like the kind of conversation you'd like to have. Head on over to Icloud.com and drop us a line. Since this was a special episode, I'll also make another pitch here. Do you think you'd like to be a guest on this podcast? We'd love to have you reach out to us and let us know. We're always looking for more incredible experts like Dan and his team to have on. Okay, that's it for us. Good luck out there and happy building.