As the popularity of ChatGPT grows, it has brought the potential of generative AI to the forefront of public attention. But what is generative AI, how does it work, and why does it hallucinate? Despite its ability to generate everything from coherent dialogues, perfectly rhymed poems, award winning arts, and melodic musical tunes, is it truly intelligent? More importantly, how can businesses leverage this technology to drive innovation and growth? This keynote will address all these questions. Moreover, we will explore some of the good, the bad, and the ugly use cases of ChatGPT-liked technologies and understand why. This will empower you to leverage the vast potential of generative AI tools productively to create the greatest value. Whether you are an AI enthusiast or a business leader looking to stay ahead of the curve, this keynote promises to provide valuable insights into one of the most exciting developments in the world of AI today.
S?: Please welcome Chief AI Strategist PROS, Dr. Michael Wu....
S2: Wow, with that music, I have to say this is the way.
Dr. Michael Wu: What do you say?
DW: You guys have watched The Mandalorian, right, when I say, this is the way you have to repeat after me say, this is the way. Okay, so let me just... Before I get into the detail I want to say that this is a very difficult topic, Generative AI is so new that even there's research scientists are, don't fully understand how it works completely. But by the end of this talk, I think you actually understand more about Generative AI and ChatGPT than some of the researchers around the world. So this is very difficult topic, so I'm gonna make it very interactive so you can actually learn better, so you know, I'm only given like about 45 minutes, so I need your full cooperation, so when I ask for your interaction, please do respond.
DW: So can I have that from you?
DW: Okay, this is the way.
S?: This is the way.
DW: Good, excellent. Okay, so I don't think ChatGPT requires any introduction now everybody have play with it, everybody have, you know, experiment with it and even use it for productivity gains. And you probably know that, ChatGPT is what's so-called a large language model. Now, to understand what a language model or a large language model is, you have to understand what a language is. So language is really nothing more than just the vocabulary plus the grammar. In that sense, I would say that's just the words and the Syntax and is in the machine learning community, they will just say that it's just a very particular sequence of word tokens, that's it. It's just all the words in arranged in a very specific way that, and obeys the rules of your grammar.
S2: The syntax, so that's it, that's what a language is. So what's the a language model? A language model is a probabilistic model that capture that sequential structure, so that sounds a little bit kind of fuzzy, what that really means, so really the question that they're trying to answer is really just this, right given any sequence of words, okay, call it S can you predict the next word W accurately if you could do that, then you know the language, because you have to know every word in the previous sequence, you have to know what, it means it could be anything, so you have to know all the vocabulary and everything, and you have to know how they arrange, and it conveys specific meaning. So that's how you have to, means you have to know the grammar, if you say something that doesn't make sense, you probably won't predict the next word very accurately. So that's actually written in this way, probability of the word given the sequence asked, so far so good. So now let's do a little exercise on do you know your English, so let's see if you know your English, so now if I say the sky is, what is the next word?
DW: Blue, right? Okay, that's it, is that only blue.
DW: Dim huge, right? Or anything so the thing that, there's actually lots and lots of words, there's actually lots of words that's actually possible. We hear lots of people say blue, so that's probably the highest probability word so probability of the word W being blue is probably the highest, so basically, so what is that probability, that probability is the number of times that you see the sky is blue divided by the number of times you see the sky is whatever. So that's the probability, that's how many, how likely it is occur. So where do you get that probability from, you get that probability from any digitized writing that humans have ever created. Okay, and that includes basically the web that you crawl. So ChatGPT is trained with these type of data, digitize essentially books and scientific journals, so everything that's basically digitized, so it knows the scientific terms all the very esoteric even chemical names and stuff like that that most people probably wouldn't know, it knows all those vocabulary and it also is, uses entire Wikipedia and it's actually compared to the all the other data sources, Wikipedia is actually very, very tiny, it's only contribute about 3% for the rest of... When you're actually taking the consideration, all the rest of the text.
DW: So these are the data that actually goes into training ChatGPT, so now if actually look at what this probability distribution look like okay, so blue obviously is the most highly probable word, but then that's other word that's probable too, that's possible right for example, the sky is clear, the sky is cloudy, the sky is dark, all those are possible, that's why you need a probability model right 'cause all these things are possible right, it's not like there's only one word, then it's a, then you wouldn't need a, then the language model wouldn't be a probabilistic model. So because there's so much different possibility, that's why you need a probabilistic model right. So now if actually look at all the other, I was saying common English words, which is about 30,000 okay, they're now because so load... There's lots and lots of words that's we've never seen them use to describe sky, the sky is whatever you've they never seen those words, so those would be essentially, probability of zero and if you extend it further out to the entire lexicon in English, it's about a little bit over a million to include all the vocabularies and scientific terms and legal terms, medical terms, that's in those scientific journals and published journals and books.
DW: So that's what this probability distribution look like. So now for example, you know, like you will probably, you've probably never seen people say the sky's delicious, so that will have a probability of zero right there. Okay, so now what if I give you a little more context? If I say the sun is set, the sky is orange, pink, what [laughter], what.
DW: Spacing. Okay so there's lots of other words that's possible now, right, but you wouldn't say the sky is blue anymore. So lots of, I would say, you know... So basically the, you get those, again, the probability the sun, just set the sky is blah, whatever that is divided by the sun just set the sky is whatever, so that's the probability, and that probability, if you actually look at it's no longer gonna be blue, clear, cloudy or dark or whatever. Because the sun just set it's not gonna be dark so all those prob you... What, used to be the high probability of the words they've been pushed down to near zero, some of them actually is zero with zero probability when you actually put in that additional context, but other words get pushed up, for example, you get like blazing, golden, spectacular some people say orange or you know, pink, so these become more highly probable words, so what that means that this probability distribution actually gets sharpened, they actually get sharpened, you know, become more concentrated. Now, if I actually, give you even more context, for example, you know, is raining hard, the sun just set, the sky is, what is it?
DW: Gray, gloomy, right. So there's so, so basically, you know, the more context you give it this probability distribution sharpens even more, and what that really means to you is that, like you say, this word distribution is able to stay on topic really, really well. Whatever you give the context, the prompt it's able to stay with that context very well. So now these probability, I would say probability, the model is trained with these data sources, the web crawl, all the digitized books, scientific journals, and Wikipedia, this is about 500 billion tokens. 500 billion token is a lot, it's a lot, it's a lot in know, 500 billion words, this is lot, but compare to infinity, it's actually nothing, compared to infinity, it's actually, it's peanuts, so what that means is that what we have, if we calculate this probability distribution, these are what we call empirical probability distribution, so you see the empirical probability, it means the distribution that you get directly from the observed data, that doesn't mean that they're the true distribution of the word probability, so this probability distribution is very, very hard to visualize. So I'm gonna show you another probability distribution that all you have probably seen sometime in your lifetime, it's a Gaussians distribution. So this blue histogram here is a Gaussian distribution, it's a empirical Gaussian distribution.
DW: So you see somewhere, you know, with high probability and somewhere low probability, but what is a true Gaussian distribution look like. It's a smooth curve, it looks like this, so that's what a Gaussian distribution look like. So now you see that, you may recognize that there are some places in this empirical distribution that have zero probability, this empirical, the blue histogram, the empirical distribution has zero probability. However, that doesn't mean that the true Gaussian distribution have zero probability there, the true Gaussian distribution have a very small probability there, but it's not zero, it's not zero. So for example, just because nobody have ever say the sky is delicious doesn't mean that it is never allow, it's not possible to say the sky is delicious. If you're writing a children's book about candy lands where the clouds are made of cotton candies, you probably will say the sky's delicious, so that's what is possible, so coming back to this, point I wanna make this point very clear. So this, the fact that the true distribution has none, zero probability means anything is possible, every word is actually possible, right, under this distribution, and that's why generative AI has this ability to generate novel content. That's what give you the ability to generate things that never existed before, it's because that probability is not zero. This is the way.
S?: This is the way.
DW: This is the way that generative AI generate new content. Now, so we have this empirical distribution. So how do we get this true underlying word distribution, this probabilistic model, this probabilistic model that we call the language model, how do we get that? Okay, so this is where something called the transformer actually comes in and help. In fact, it is called transformer, it's, I don't know why they're called, they're called transformer, because they actually, it's what we call self attention mechanism, so these are transformer that's able to transform to different shapes so they essentially we are able to transform the data into different spots where they should focus on. So that's probably why they call it transformer, but it is what is allow this empirical model of probability distribution be converted into the true probability distribution. So now this true probability distribution is the language model, and if it's a big one, a large one, then it is a large language model such as ChatGPT. So now this transformer essentially is some kind of neural network architecture that you don't have to know what it is, so you give it some sequence and it's gonna come out some probability. And the probability that comes out on the end of that model is this distribution that you know, that this is language model, this distribution of the words given a sequence, so now once you have the probability distribution, what can you do?
DW: You could sample from them, just as you would sample from a Gaussian distribution to get any random number, if you have a distribution of a Gaussian distribution, right, or any distribution, you could draw some sample from it, so if you have the word distribution, some words are more common, some words are less common, you could just draw from the distribution to come up with that next word. So that's how ChatGPT is creating those sentences and paragraphs, it's generating one word at a time, it doesn't feel like that, but it does do that. So if you basically, if you sample that one that W from this distribution, you can actually make that sequence a little longer, just add the text to that, add the text to your sequence, and then you can feed back, feed that back to your transformer model again, and then you go do this again.
DW: Again, it generates longer, longer sentences. And that's GPT, okay? Generated pre-trained transformer. So now you know that GPT generated pre-trained transformer is a large language model, there's really nothing more than a random number generator. So if you, the morning you say like, we are not using some random number generator to generate those ancillary fair ancillary price, but surprisingly, ChatGPT actually do act like a random number generator, and the key thing is that if you specify that distribution very, very well, okay, whatever you drawn from that random number will make perfect sense. If you, feel that distribution is not good enough, then what you drawing will be gibberish, it would not make any sense, so it's a random number generator from a distribution over all the words, given any sequence train with human written text that's using a transformer architecture of neural network.
DW: That's it, alright. So you said, well, that doesn't sound very intelligent, that doesn't sound very smart at all. But you say, you may also say that, well this is just a base model, this is just the GPT, it's not really ChatGPT yet. So this GPT, the base model is basically pre-trained using supervised learning mechanism or some people actually call it self supervised because the human don't have to actually label everything, but in fact, human actually do label 'cause all those texts are written by humans at some point in time, so human do actually provide a label, but we just take those, all those texts, and then, so I treat it more like a just supervised learning, but some people call it self supervised.
DW: But anyway, details, so what he's able to do this, this probability distribution is able to produce plausible word continuation, text continuation, right, if you're giving me some text, I'm able to continue that, that text in a plausible way. The problem is that like, plausible continuation of text is not necessarily good responses to like commands or questions, for example, you could, the user could ask like, explain how cost-based pricing in B2B work, and this large language model could easily could say, well explain how value-based pricing in B2C work, that's a plausible continuation, in fact, this is the ChatGPT probably have seen somewhere in the training data, some exam question that lists these things back to back, so we need to train, ChatGPT does not give this type of response to give a more realistic kind of response that actually answer your question.
DW: So basically this GPT based model is basically used to fine tuned using transfer learning to fine tune it, to actually follow instruction, to keep, response to not just not just provide plausible continuation, but also responding to your question. So if you do that, basically ChatGPT now know how to respond to you, but there's also another problem, ChatGPT knowing how to respond to people's question is not necessarily a good open dialogue. So sometimes I don't know, people, sometimes, if you remember in some of these drama, I don't even know what their names are. Some, there's some bad boyfriends, who just doesn't want to talk to you, dude, girl, and then they just like, he just respond to everything that the woman ask, but then, answer him perfectly, but just short succinct, but just doesn't wanna have a conversation.
DW: So when you see that, you know that, when that happens, so that's kind of what would happen here if you actually release this model right now, basically ChatGPT will be a very bad conversation partner, because, it does give you an answer to everything, but doesn't wanna have a conversation. So now basically the people at OpenAI basically say, well, now we need to train ChatGPT, this base model to essentially learn what is a good conversation, but that's actually very, very hard. You can't actually describe what a good conversation is, it's very hard for you to write an objective function down so you can optimize that objective to get a good conversation, so instead, what they do is what, what they do is like what we call reinforcement learning with human in the loop ranking of good dialogue.
DW: Okay. So this is basically they could generate, ChatGPT would generate a respond, and you want some humans in the loop to tell it how good this respond is, and then essentially that's essentially what you need for reinforcement learning. For those people who took the workshop yesterday, you would know what that is. So basically ChatGPT just randomly generate, a lot of response. Not completely random, right? Because what's random to ChatGPT is he's actually following the word distribution. So to us, because that word distribution encapsulates all our grammar and all the vocabulary, it will look like text that actually makes perfect sense, right? So it's making, generating these responses and human have to tell 'em what is a good response, what is a good conversation. Now that's actually, have few problem, because scoring a dialogue is actually kind of nebulous, and it's actually very inconsistent among different users, so it's actually very hard to calibrate different users, you have one person may say, this is a good dialogue. Another user said that's a terrible dialogue.
DW: So the way that that, OpenAI overcome that is basically they train another model, essentially, okay they generate a whole bunch of responses, generate like, for example, four responses here, and then human basically rank them, see which response actually is better, okay. And then based on these ranking, they supervise use supervised learning, try to train another model. So, it is using human to the ranking to train the supervised model, now you have one AI, one supervised model learn supervisor model, this is what we call the reward function. Now, this reward function basically goes so, and actually into this feedback loop to score every respond that GPT generates. Now, ChatGPT knows how to have a conversation. Okay, so that is ChatGPT, how ChatGPT was trained, okay there's a lot of lots going on, but that's how it was trained. Now the thing to recognise is that still ChatGPT still has no real world knowledge.
DW: When it is never trained to understand the world, to understand the physics, didn't understand that if I take this and throw somewhere, it's gonna fall, doesn't know water will pour, doesn't know anything about the real world, it's just all text that's written by human and he knows how to have a conversation and that's it. So now the key thing is, so it will hallucinate and in fact, by construction, the generative AI by construction is, there is ability to listening to generate things when they have when the real data have no probability of zero, when the empirical distribution have probability zero, this distribution actually have none-zero probability, so they will generate things that doesn't exist in real data. That's why they call generative AI. So for example, I just show you example what that is, okay so tell me the names of all every window product, guess how you respond, this is actually real text just generated from chat GPT, so air traffic are real products, [0:21:57.5] ____ real products and then actually generate a whole bunch of other stuff. So, Anton, are you here, Seth, are you here where's Seth.
S?: I'm here.
DW: Okay, are any of these like product like names ever used in every Windows product.
S?: One and two.
DW: One and two, only one and two, the rest of it is actually what, are they ever used?
S?: Yeah, they are.
DW: No, never used, Are some of these products ever kind of thought about or?
DW: No, completely fabricated, and you actually generate it in a way that's very, very believable, it knows that it has that air, something, you know, prefixed to brand, that the brand, that it has the, and it describe what it does, and it sounds like very legitimate hallucination complete out of fabrication, these things don't, number three to number six don't exist, never even existed, ever before, but it's able to generate that because what doesn't exist, the empirical probability zero, the true word distribution, the true underlying distribution has non-zero probability, this is the way.
S?: This is the way.
DW: Okay right, so now the question is hallucination will happen, is that a feature or bug. Is that a feature or bug. So if you are in what we call design or creative industry, yeah, it's a great feature, people have been using, this ChatGPT to create designs, any design, drug designs, mechanical designs or art, fashion, whatever. Perfect you wanted to generate stuff that doesn't exist, that's what you want but what happened when you have a fact-based application, that's the problem, isn't it? Okay, that's the problem, so fact-based application requires something called grounding, okay. What does this mean grounding.
DW: So grounding, so what that means is that, you know, you have this true distribution, the true Gaussian distribution, and you also have an empirical distribution that's actually calculated from your data, and what it means is that when there's no data in the empirical distribution, don't make things up, don't make things up. Well, you could say that well I could just set all the probability to zero when there's no data, but then you don't have a generative AI anymore, you have no generative AI, the whole thing, the whole like underlying, like I would say the fundamental property of the generative AI is the fact that when the empirical data distribution is zero, the true underlying distribution, is actually non-zero, you've set that to zero, you can't generate anything that never existed, you can't generate anything new. Then what good is that, it's good for something but not generative AI. So does that mean that you cannot use generative AI in a fact base application no. Okay, in fact, there's something just, you see this morning in some of the demos, you can actually use ChatGPT to get an answer, but how does it actually work.
S2: That actually works under a mechanism called RAG Retrieval Augmented Generation, RAG. So for fact-based application information retrieval is perfect, it's, for example, a search, internet search or enterprise search, whatever, these are perfect for fact-based application. If you can't find it, it doesn't exist, if it exists, you should be able to find it, if you use the right query and the right kind of keyword, you should be able to find it, so that's information retrieval, so what you can do is that you actually take that information, use conventional information retrieval such as the search and basically give you some answer and then give those answer to ChatGPT and have ChatGPT summarize it and give you the result, and when you actually cannot find anything, you could just tell it the user I don't know, this is called, Retrieval, Augmented Generation, RAG, and this is actually how, what you see this demo in this morning, how that actually work, yesterday, and you know, Linda talk about putting guardrails for these, generative AI, this is one of the mechanism, okay Retrieval Augmented Generation, this is the way.
S?: This is the way.
DW: This is how you use generative AI in a fact-based application. Okay So we're gonna, you know, as from the title we're gonna go beyond chatGPT, so we're gonna go a little bit beyond chatGPT now, okay. So there are some very generic, generative AI such as, you've seen for tech, for visual such as images and videos and for audio as well, for speech and music and so on, so if you just look at the text category, there already are lots and lots of different models, ChatGPT is only one of the model. In fact, ChatGPT is actually not even the newest, The transformer architecture that I told you about, is invented by Google, okay and they actually, the first, very powerful, these type of language model, they come out of the transformer that's invented by Google, it's BERT, it's their first one there that you see. And each one of this model can already give you lots and lots of application, you can start to have it like just help you with general writing, summarize notes, compare or contrast, text, compare and contrast, it is very useful, when you have a large set of document is to review, I say what's new or what's similar or what's different.
DW: That's very, very useful. Generate of course, content for marketing, for support, for sales, anything, so each one of these application already can generate, I say, with lots and lots of, startups, and that's, many more to come. If you actually look at the code part, code is just nothing more than a very specific kind of language, it's a language that you talk to machine, so you could do the same thing, but you could do the same kind of code generation, you could do documentation, create SQL, queries and so on. And there's actually many more, Such as images and videos, speech and even music, so there's lots, lots of type of generative AI, but I do want you to think about a little bit what's beyond this generic type of AI, for example, these are highly specialized AI that, for example, generate, 3D worlds, so this human, MDM, human motion diffusion model it has to know like how, humans move you can't bend my arm the other way, I'm not an alien, so it has to know, you have to generate like motions, that's actually plausible. You have to understand a little bit about more than just the visual, the pixel that it generates on the screen.
DW: For example, they're actually in biotech, you can already generate like proteins, from the protein folding 3D structures of protein to generate drug candidates and so on. So there's actually many more to come, in the future, there will be one that generate mechanical design, even design your car, the shape of your car, the shape of your wings of your planes, and so on to optimize aerodynamic and so on, these are all coming. In the future, there will be lot more of these type of models to come and lots more, I would say use cases to come and many, many, many, many more, startups to come as well. Okay, alright, ever since chatGPT was launched, lots of people actually approaching me and say, 'Well, how can we use it" I hear so many different use cases, and actually a lot of them are, I would say, good use cases, but sometimes you hear like, there's actually it's always the good, the bad and ugly.
DW: So let's save the good for later, let's take a look at some of the bad and the ugly use cases. And these are basically use cases that I would say they're not like bad, but they're just may not be the best use of your technology, but the problem is that whenever the new technology comes along, we always, see everything as a nail, if we have a new hammer, we see everything as a nail. So let's take a look at some of these, so I'm gonna ask you guys, discovering market trends, do you think that's a good use case why, why not? The problem with this use case is that like, well, chatGPT doesn't know anything current, if you ask them what's happening this morning, it's gonna say, my training actually stopped in 2021 and it doesn't know anything that occurred.
DW: Bad use case. But you could, but remember what I said about Retrieval Augmented Generation, you could certainly use your Plano search to search the internet now and then give the result to ChatGPT and have it summarizer for you, and you could do that, but that's not really chatGPT generating the content anymore, is using it as a summarizing engine. It could work, but not really a, I would say a good use case, but works, could work. Another one, automated SEO keyword generation or keyword research, you think that's a good use case. Okay so ChatGPT can give you a set of keywords, but it will have no idea how they perform, it's not trained on the results of the search engine of results, so it would tell you, but you... It's a good starting place that's why this co-pilot is actually great. It's a good starting place for you to explore and do more research on those keywords, but if you wanna automate everything with chatGPT, it's not gonna work very well.
DW: Another one, English pricing method to formula, if I could, can I use natural language, describe a formula and then you translate that into a formula for me. So I think that that's a case that could work also, but the problem is that these pricing methods, they basically are Excel formulas, they're Excel formulas, so if you know what the pricing method you want, it's probably easier for you, just type it in Excel if you want to explain like a very complex formula with like five levels of parentheses, good luck describing that using English, the reason we have formula is because English is actually not the best way to do everything, English is not the best way to do everything, but on the other hand, I would say that the reverse side would be a very good use case, you have a very complex formula, you don't know what it actually is doing, it's very good use case, give it to ChatGPT, and tell, explain to me what this pricing method is actually doing, I don't understand that, that's a good use case. But if you know the formula, you know the pricing method, just type it in, that's all.
DW: Alright, let's totally look at some other ones. So employee performance review reports, so this is actually on, one of my cousins actually use this and she actually is senior leader in the a big company IT department, she had to write lots and lots of performance report for her people reporting to her, she just actually, now prepare everything in the a Excel sheet and then with every few bullet points for each person, and then it generates all the, a very nice performance report for everybody that reports to her. Works, not in scale, but could work, how about this one explaining complex data and charts, you think that's a good use case? Come on, let's be interactive, yes.
DW: Some people say yes, some people say no. This is one of those things that, again, the reason charts and data exist is because English is not the best way to present data, you could use English to explain what the charts and the data is actually showing you, very often when you see a chart or graph, you have a figure legend or something to explain what you should focus on, you could definitely do that but that's not a thing... You should not use that to replace all those complex charts and graphs, 'cause that's not gonna work. How about this last one here Enterprise knowledge base. Well, what you've seen this morning from Claudia is essentially example of that, okay, it's actually searching, or not really searching. It's actually understanding our context, and it's actually searching the, our it's using the RAG method that I told you before.
DW: Retrieval augmented generation, it's searching our knowledge base and then actually giving those results to ChatGPT and then, essentially summarizing the result and give you the reference as well. It's in fact that's actually how this Bing, the new Bing search actually work, that's how it's able to give you those reference, otherwise, it cannot actually, ChatGPT is not gonna give you, be able to give you any reference at all. So from my experience, actually of lot of the company that I talked to, this use case actually did not work too well. And that's mainly because a lot of company don't have a very good knowledge base that's well labelled, well tagged, one for which version of the software for which product and when it's generated for, but if you do have a very well labelled, well tagged knowledge base, it will work very well, but if it doesn't, ChatGPT cannot fix your bad data problem, okay. Okay, so these are, I would say the bad and ugly kinda use case, some of them could work. But what are some of the good use cases.
DW: We should use generative AI for its generative nature, remember the generative aspect of generative AI is the feature, okay. If you basically set those near-zero probability to zero, you don't have a generative AI anymore, okay. But what that means is that you should use it in cases that are very transient, for example, not prescriptive, it cannot be pre-programmed. It have to be created at the instant, at the moment and use it right away, so those are the good use cases of generative AI. One thing that people often ask, why can't I have, generative AI write codes for me? And basically engineers and data scientists, you've be writing lots of code. Now, people are saying that English will be the most popular programming language in the future because it's you just tell chatGPT what you want and it'll write codes for you. But then that's a problem today, anybody know what the problem is?
DW: Because of copyright issue. So the court have just ruled that content that is generated from these generative AI are not protected under the copyright law, 'cause you... There's a person, I kind of forgot her name, she created a comic book using these image generating generative AI's. She created all the prompts to generate all the images, create a comic book, that comic book is not copyright protected, because the courtroom basically says you don't even know how it's generated, you could copyright all those prompt that you use to generate those comic book, but the comic books is generated by generative AI that you didn't even know how you generated it, so you can't copyright that. So if you write, you could use it the code, but those codes won't be protected by copyright.
DW: So any of your competitor could strip out your code base and just implement anywhere else, so it's not good. But does that mean that you cannot use these generative AI to write code for you? Certainly not. There are actually... There are cases where you can actually generate code that you don't care about people taking, and those are the cases where the codes are meant to be executed, and maybe only once and never executed again, they could throw them away. So what are some of those... Why would I write a code that I want to execute only once and throw it away? Here's one use case, you can use GitHub Copilot or Codec, for example, to generate for example, a SQL query from natural language. The key here that makes it a good application for generative AI is that you can ask any natural language question, any question, against a database.
DW: So obviously the prompt will have to include some information about a database. You have to know the schema, the data dictionary, and what each column actually means and all that stuff. One example of that is, for example, you could say, what's my total margin improvement for my French customers last quarter? So you, it will actually generate a SQL, that SQL is then executed against the database, and it returns the answer for you. That code, you don't really care if it's throwaway, it's just executed once. 'Cause what are the chances that you're gonna ask that question again? Maybe ask the next question is it generated again? It's no big deal, you don't have to keep that. It's not... You execute it against a database and come up with an answer.
DW: Now, so obviously we have to put some guardrail into place because this generative AI are... There are risks associated with it. For example, you may want to make sure that people cannot write to us. They like, update my database with blah, blah, blah. That should not be allowed. They have the respect access permissions. And for adoption and value, it's actually very important that you have the ability to explain how you come up with the answer. If I just tell you that answer is 5.78 million Euro, how do you know it's right? How do you know it's right? You ask a question against this database because you didn't know the answer.
DW: But if we give you the answer, how do you know it's right? You have no way of knowing. If I just give you the answer, you have no way of knowing. So what would you do? You probably go and write the SQL and then to check. If you're gonna write the SQL to check, why use ChatGPT in the first place? No point at all. No value gain. So basically you have to be able to step through the calculation. For example, the problem is that, English again, is not a very precise language. When I say like, "Oh, for example, French customer." Do you mean only the customer in the France country, or do you mean all French-speaking country? Do you take care about the French colonies or all the other places that where French is used? So French customer could mean a lot of things. So if you listed a whole bunch of customer [0:41:22.3] ____, oh, maybe this one that should not be considered. So you need some iteration to update this SQL query.
DW: So remember, without trust in the system, there's no value. So, people need to be able to trust the system, and to build this trust, you need to be able to explain what you do and how you generate, come to that answer. Okay, so another use case is actually, I've seen a very good use case is that, you can use ChatGPT to essentially be this virtual agent for a guided shopping experience and personalization of contract under the contracting phase. So during any kind of B2B or sales scenario, there's usually a shopping phase where the buyer would go to a sales and maybe ask him a whole bunch of questions about when is to sell or wants to buy certain thing. Maybe he needs a recommendation for a new product, a better product, a better performing product. You could say that the last that a light bulb that I bought from you all fail after a two-year, can you get me something more reliable than that? So it need to know what's in the catalog, probably need to know something about the rating and reviews of different product. And, so after you basically do, are done with his shopping, you come to the phase where you start to negotiate for price. So you say, "Well, if I buy additional 500 unit, can you give me one more percent discount?" So on. So all this, you start to negotiate.
DW: So basically, here's where you have to put guardrails in. You had to comply to business rule. And you have to also make sure that you have a separate model instance for the different customers, 'cause you certainly don't want customer A as ChatGPT send out, oh, what do this vendor, what kind of discount this vendor giving to customer B which is a competitor? Certainly don't want that to happen. So this is, you want prevent this accidental information leakage, because ChatGPT remember some things if you let it remember. So the guardrail has to be put in place. And then finally, you get to this contracting phase. The contracting phase is where you may want to automate things such as shipping to different places and so on. So this is actually something that's actually very useful here is, you can actually encourage the salesperson to input these personal relationship information with the buyer. So CRM has been trying to get people to input this personal relationship information into the system forever. And sales never wanted to do it. They never do it because, what have they to gain? They have nothing to gain. And in fact, they'd be a little bit more, I would say de-leveraged.
DW: Because if I put the information in there, then I know everybody, then it basically doesn't help them at all. But in this case, it could actually help the sales because if I can put in those personal relationship information, you can actually generate a very personalized contract that's complete effortless. For example, if you know the buyer is vegan, for example, you say, at the end of this contract you say, "Next time you visit town, let's go and visit this vegan restaurant that's open." Something like that, just very personal. Okay, when you actually do that, ChatGPT actually remember those things. So even the sales person has left the company, you actually retain those knowledge, which is actually reall, really good. Okay, so let's look at...
DW: So that's a B2B use case. So let's look at the travel use case. One travel use case, you actually use DALL-E and ChatGPT to actually generate hyper-personalized, a hyper-localized marketing micro content in the pre-sales phase. And you can also use ChatGPT to generate essentially this guided, almost be a virtual tour guide in the post-sales phase. Obviously, you need information in the prompt to tell the ChatGPT about your previous trips, your loyalty status, and maybe even your social media to understand what kind of interest do you have. For example, the micro-content that's generated for a different kind of traveler could be very different. For example, if you know I'm a foodie, you would say, "This destination is a foodie heaven, you should go there and try out all the local dishes." Or you show me all the most delicious dishes locally. But if you know I'm a hiker, you show me a complete different image. You show me that this is a hiker's paradise. All the most beautiful hiking trails in the country are in the destination. And if you know that I'm an animal lover, you show me that, all the animal sanctuary, where you can spot wildlife and so on.
DW: So obviously the guardrail, there need to be guardrail here too. You have to respect users opt in, and respect their privacy and opt in opt out status. So the value there is that, you actually, because of this, such a personal personalized experience, it knows what kind of interest you have that it actually improve conversion rate. Moreover, you have this post-sales phase after you actually... Very often in customer experience, there is what we call a experiential gap between the pre-sales and post-sales phase. And before you buy, I'm super nice to you, after you buy, I don't want to talk to you, 'cause every time you talk to me, you call me at the call center, it costs me money. So that's a pre-sales, post-sales experiential gap. But with this you can very easily bridge that and mix it like a, make the user actually have a consistent... Have a consistent pre and post purchase experience.
DW: So, because after you purchase, now you know my trip, where I'm going, but you also know where I've been to before. You can actually recommend places that I've yet to visit. You can recommend itinerary, but not just the popular itinerary, places that are specific that I'm interested, that I have not been to before. So, much better customer experience. So these are, I would say, use cases. So let me talk about a little bit of the science, 'cause I'm a scientist. So why does actually work so well? You've seen how ChatGPT is trained, it's basically, the key thing is basically that probability distribution of the words. That had to be very well specified and the transformer does that for you.
DW: Why does it work so well? And this is actually a concept that we call high interdimensional correlation. So that may not mean anything to you, but it's very, very apparent in image data. So this is a image where I have 99% of the pixel masked out, so you should not be able to see what's underneath it. If you could see it, you're genius. [chuckle] What if I review more? Each pixel is actually telling you a lot of information about the pixels around it. So if I review a little more, so 97%, you still can't tell what it is. 95%, you could start to kind of see what's the picture is underneath. If I ask you to fill in those black pixel that has been masked out, you could kind of do that. Not very reliably at this stage, but at 90% you could kind of do it. You won't get the exact pixel perfectly at exactly what it is, but you'll get one that's actually decent and probably visually indistinguishable from the final picture. So if I actually give you 80% mask, you have no problem doing that.
DW: If I actually give you 60%, it becomes very, very easy. So basically each one of these pixel is telling you a lot about the neighboring pixel should be. This is why I mean interdimensional correlation, they're correlated. Every pixel is heavily, highly correlated to the neighboring pixel. It's telling you a lot of what the neighboring pixel should be. So you could fill those in without much problem. 30%, and this is the final picture. So as I said, if I ask you to fill in those black pixels, you will not get every splash of the water correctly, every shadow or every sculpture, every crack perfectly, but you'll get one that's consistent. There'll be no, you probably can't even tell the difference. So that's what this very apparent image data, but it's also very apparent in audio data as well. Anybody have seen audio data? What do audio data look like? They actually look like this. So this is what a simple tune will look like and this simple tune is this, don't fail me.
DW: So you seeing each one of those stripe essentially is a musical tune. So now let's try a little something a little bit more complex. So how about this?
DW: Okay, that's what audio data look like. Let's try one more. Most people, how many people actually have seen this audio data before? These are what we call spectrogram. If you play music or something, you probably have seen this. So let's try another one. So this one a little harder, but you still can see this pattern there.
S?: And the only way to be truly satisfied is to do what you believe is great work. And the only way to do great work is to love what you do. If you haven't found it yet, keep looking and don't settle.
DW: Don't you love that speech? It's a great speech. So you can see that there's some patterns there. You may not know what the pattern underlying, you cannot describe the pattern, but there is pattern, because each time sample tells you a lot about another time sample in this audio data. So I'm gonna do the same experiment again for you. So this is some music, and if I ask you to fill in those black stripe that mask out 80% of the time samples, can you do it at 80%? You probably couldn't do it. But if I review a little more, at 70%, you could kind of start to do it. And if I do it at 30%, you could probably definitely do it. You could fill in those black stripes, and then basically come up with the music that actually sounds probably indistinguishable from what is supposed to be. So now at this is...
DW: Okay, there you go. Okay.
DW: So the key to remember is that like you can actually see the pattern. You could hear the pattern, you could see the pattern, and you could fill in those missing data very easily. Coming back to text. So ChatGPT have written a birthday poem for me when I turn 50. So text is a little bit harder. If I mask out like 80%, you probably couldn't do anything about it, but if I mask out 10%, these are 10% random text that's masked out, you could fill in the rest of it with very little trouble. Again, you may not get the exact word perfectly, try to do it, try to do this yourself. Try to do it, and you may not get exactly the word that was underneath the mask, but you'll get one that actually makes sense, makes perfect sense. So that's what those texts are.
DW: So the ability of these generative AI to generate text, image and music comes from the fact that if you just give it a little bit of information, it got to generate a lot, 'cause each one of this actually tells you a lot of what the neighboring next word is. And it's actually, each pixel is telling you a lot about what the neighboring picture should be. And each one of these texts, each one of the word tells you a lot about what the next word should be. And the more you give it, the more information it has to work with.
DW: So sorry, I feel like I'm acclimatized. I did not acclimatize well, I feel like I'm out of breath. Okay. So there, okay. So that's how ChatGPT and other generative AI, such as DALL-E, Midjourney, is able to generate photographs that are award-winning. Generate art that's actually win awards. And even in the music industry, basically, we see this generative AI already can deep-fake the voices of famous singers. It's gonna flood the, essentially the streaming platform, and it's got the people on the hills worried about how it's gonna disrupt the industry. It's actually winning awards. It may even win a GRAMMY award in the future. Do you believe that? You believe that? If you don't believe me, I have a spliced 80 second of AI-generated music for you to listen to, and you could tell me afterwards.
DW: Isn't that amazing? It's all done with the same type of technology, predicting the next note, okay? From a given sequence or note, same thing as chatGPT, right? So that leave probably leave you a question. Where does that leave PROS? It's so amazing, this generative AI, right? So, you know, so let me tell you, the type of data that we work with is actually much, much harder to work. It's not actually very conducive for generative AI to do the same type of generation mechanism on our type of data. Our data actually look like this, okay? So these are what we call tabular data. So one stark difference between this tabular data and the type of data that's used in many of other the generative AI is that it has lower dimensionality. So it doesn't have as much dimension. It could be tens, hundreds, or maybe thousands dimension. But in generative AI, you are working with millions and, you know, tens of million, hundreds of millions of dimensions, right? An image, you know, like 12 million pixel image has 12 million dimensions, right? Much higher dimensionality. But what you see here that makes this problem much harder is the lack of this internal correlation, okay? The lack of internal correlation, okay? Other than basically the first kind of three row in the bottom right? So let me tell you the first call, first row in the bottom is the the transaction ID. Okay?
DW: It basically goes from low to high. So green means high, right? So, and the second row from the bottom is the date, the third is the week. So those, you could probably see some pattern there, right? But the rest is very, very hard, right? You've seen this data right now. Let me do the same experiment with you again, okay? If I will ask you to fill in the missing data that I mask out, even with just 20% mask out, you have very difficult time, right? I want you to focus on maybe one data element here. Say this one right here, okay? Look at that, right? So you could probably guess that while that row there looks like there's only, you know, red and blue, you know, data point, right? So, so should it be red or blue? Okay? Can you tell how many people say red? Come on. No. How many say blue? Okay. All right. Okay. So what if I reveal more, right? The more I reveal right, it does not tell you very much about what that data point should be, okay? Right? It's very hard, right? That lack of correlation make this a much, much harder problem, right? For generative AI, right? So in fact, it is actually blue, okay? So good job, you guys, you guys are much better at generative AI than mostly this other platform, okay? So not only are we dealing with a much harder data set to work with, right? ChatGPT and PROS AI are actually very complimentary, right?
DW: Remember, ChatGPT is a pre-trained transformer, right? So these trained with data from the web, from digitized books, scientific journals and Wikipedia, and these are, it's pre-trained, meaning that it's actually trained, pre-trained before, way before it's actually used, right? So you train it, for example, you know, with this data, right? So chatGPT actually become very smart. So after a while, you know, because training takes time. This large language model actually take a long time to train, right? So through engineering, this training time will get shortened, but it'll never be zero. It'll never be zero. So after it's trained, this ChatGPT basically have very, very broad knowledge, right? Everything from Wikipedia, everything from every book, every scientific journal, paper ever written, right? But it's all in the past. It's all in the past, right? Whereas in our PROS AI, right, our ____ estimation, our forecast and our optimization is everything from now and the future, right? However, we don't have such a broad knowledge, right? We are applications, our algorithms are very domain specific, right? So we have a much more narrow knowledge, right?
S2: So we are very complimentary. We're very, very complimentary. Now, does that mean that you cannot use ChatGPT to tell you about the future? Can you use ChatGPT to tell about the future? Well, remember this RAG that we talk about retrieval augmented generation. Remember that? Okay? We could do the same thing, right? We could do it here, we could actually use, you know this prediction of forecast algorithm to forecast a picture of the future, give the result to ChatGPT and have ChatGPT, summarize the result. And if the prediction kind of confidence level falls below a certain threshold, right? It doesn't have to be zero. If it's falls below acceptable threshold, say like 30% or 20%, whatever. Then you just say, I'm not sure. Right? So this, I call it prediction, augmented generation, right? PAG, well since it's PAG, why not just call it PROS Augmented Generation, right? So with this, even Baby Yoda, is I was gonna say, this is the way [laughter], this is the way because it's the way to the future. Thank you.