Skip to main content

Beyond Agents: The Path to AGI and ASI

Large language models amaze, but true autonomy demands more. In this visionary talk, Dr. Michael Wu lays out the recipe for turning “dumb” LLMs into smart agents that reason, act, and learn—then charts the exponential runway to Artificial General Intelligence and, ultimately, Artificial Super Intelligence. Explore the new scaling laws of knowledge, the ethical imperatives of Responsible AI, and what businesses must do now to harness the power—and manage the risk—of intelligence beyond our own.

Featuring:

  • Dr. Michael Wu, Chief AI Strategist, PROS

Full Transcript

Michael Wu: Okay, does anyone remember this music? Anybody know where it came from? Let me give you a hint. Yeah, Matrix, it's The Matrix. So today we are going to talk a little bit about Matrix, because we are going to talk about this guy, Agent Smith. Okay? And some of you may know that Matrix is actually a movie that's actually inspired by real-world concept in software agents. Okay? So Agent Smith, for example, is goal-directed. He's programmed to keep order in The Matrix, and he has autonomy. He can basically do whatever he wants, whatever he thinks is necessary to achieve that goal. And he's context-aware, right? He can monitor all the anomalies within The Matrix and basically appear out of nowhere. And finally, he can basically use anything and anyone within The Matrix, and that gives him these superhuman powers. Okay?
...
Michael Wu: So now, who can tell me when was actually The Matrix, the movie released? 1999, okay? Exactly. So way back in 1999, we already had the notion, the concept of software agents. So the question is, what's all the hype about agents now? What's all the hype about agents today? So this is one of the many questions that we will answer throughout this talk. But before I begin, let me tell you a little secret behind the scene. So about four months ago, when the event marketing team asked me, what am I going to talk about at Outperform, I said, we could talk about one of three things. This 2025 is known as the year of agents. So obviously, one of the topics that we could talk about is agents, right? But I said I could also talk about artificial general intelligence, because that's something really exciting, because that's for the first time in the entire human history, we finally get a glimpse of intelligence that's comparable and beyond our own. Okay?

Michael Wu: And finally, I could also talk about something that I'm very passionate about, is the responsible use of this technology. So while the event marketing team gave me a very tough problem, they said, well, why don't you just talk about all three? So this is my best attempt to weave these three topics into this narrative. If you don't like what you hear, you know who to blame, okay? But if you do love what you hear, you should go thank them, because they pushed me. Okay, so let me begin. So OpenAI has a model for the development of their generative AI, and it goes in five stages. So let me just briefly tell you these five stages. The first stage is what they call the chatbot. Essentially, these are large language models that we are all very familiar with already. We have ChatGPT that most of us already use, and all the different flavors of chatbots.

Michael Wu: But these chatbots are, of course, they're not very smart, right? They hallucinate a lot. If you remember from a couple years ago, I'm going to talk about this, is that we have to augment them with other technology, like search engine and so on, to make them reduce their hallucination. So once you have that, so we definitely passed that stage, right? We definitely have passed that stage because this is two years ago, right? So the next stage is what I call a problem solver. Now, problem solvers are much, much smarter. They look exactly like a large language model. You use them just like a large language model. You probably don't even know the difference, okay, when you're actually using it. But the difference is that these problem solvers, they're able to reason. They're able to think and solve problems for you, even new problems that they had never seen before. It's really, really amazing. Okay?

Michael Wu: So these are what OpenAI call their O-series of model, okay? They are the o1, o3, Claude thinking model, you know, so anytime you hear these thinking model, this reasoning model, these are the type of LLM that we're talking about. These are the problem solvers, right? Even like DeepSeek R1, you know, is such a kind of model, okay? They actually are able to think and reason, okay? And then it brings us to agent, okay? So we definitely passed the problem solver as well, okay? So that's why we're here today, right? 2025, the year of agent. So we will hear a lot more about agents later on, but what's after agent? After agent, we have the notion of innovator, okay? The innovator is basically an AI that can actually innovate itself and improve itself. So it actually gets smarter and smarter, you know, all by itself, okay?

Michael Wu: And you know what happens then, right? You've reached the point of what we call singularity, right? This knowledge explosion, and that's actually how you get to AGI, okay? And AGI is popularly known as artificial intelligence that have the intelligence level of a human being, okay? We're going to go into a little bit of that, a little bit more detail later on. But before I go into deep dive, let's review a little bit about what we know about these chatbots, these large language models, okay? So as you probably remember from two years ago, if you did attend Outperform two years ago, these large language models, they attempt to predict the next word, right? You give me a sequence, right? And I'll just try to predict what the next plausible word, not even the most likely word, plausible word could be. And they do this by modeling this probability distribution, probability distribution of P, of W given S.

Michael Wu: Notice the clicker work this time? Okay. All right. So it modeled this probability distribution. So this initial sequence, S, is typically provided by the user as a prompt, right? You give me a prompt, give me a sequence, right? So that initial sequence is going to define the probability distribution of all the words in the language. And since they have the full probability distribution of all the words in the language, you can actually now sample from that probability distribution and you get a word out, okay? So once you sample from that, you get a word out. And what you can do is you can actually concatenate that word with the sequence. So that sequence becomes one word longer, right? And you can basically put this longer sequence back into your probability distribution and sample the next word and get the next word out and sample it again and again and again. So that's how ChatGPT generates sentences, paragraphs, and the entire response to your prompt, okay?

Michael Wu: It's doing that by sampling from this distribution one word at a time. That's why when you actually type in a prompt to ChatGPT, you see the response come out one word at a time. This is exactly what it's doing, okay? So now we also know from a couple of years ago, because this probability distribution is actually not zero anywhere, okay? You can actually generate content that actually never existed before, okay? Truly novel content that didn't exist before. And what that means is that if you look at the probability, the empirical probability of the response, right, from all the training data, you will see that the probability, empirical probability is exactly zero, exactly zero, okay? The example that I give is basically, you know, if I give you a sequence, the sky is, there are many, many possible words that could go after the sky is, right? But you probably never hear people say the sky is delicious, right?

Michael Wu: So the word delicious essentially have an empirical probability of zero, exactly zero, okay? All right. So, but that doesn't mean the true distribution, right? The true distribution of P or W given S is actually not zero anywhere, right? So that means you can still generate and make the language model say the sky is delicious. All you need to do is prompt it the right way, right? You prompt it by redirecting this attention to focus on words about something delicious, such as saying it that way, right? Saying that in candy land, the clouds are made of cotton candy. The sky is, now all of a sudden, the language model is very, very likely to say the sky is delicious, okay? So that's how we, what we know about how large language model work today. So how do we take these large language model that hallucinate, that truly are not very intelligent and turn them into a truly smart, PhD level smart problem solver, okay?

Michael Wu: So we all know that reasoning is actually a fundamental to all problem solving, okay? And there is something called chain of thought prompting. Chain of thought prompting is actually a mechanism, a way to improve the performance of these large language model, okay? So let's take a look at example of how that works, okay? So chain of thought prompting, example of that would be if you write the prompt, a farmer has 10 sheep, all but three ran away, and so he got two more, how many sheep does he have now? How many sheep does he have now? You guys can do arithmetic? Okay, all right, you guys know what the answer is, right? But sometimes, sometimes, and not all the time, sometimes the language model will tell you the response is nine sheep. And you know that's incorrect, right? That's incorrect. But let's think about why it's doing that. Why it's doing that.

Michael Wu: If you look at this prompt right here, if you simply remove one word, just that word, okay? A farmer has 10 sheep, but three ran away, so he got two more. How many does he have? Then nine sheep would be correct, right? Okay, nine sheep would be correct in that case. So that actually tells you something. The language model is really truly dumb, truly, you know, not very smart there, okay? You know, it's looking at the sequence, right? So if you give me this particular sequence without the word all, right? Then the next plausible word is nine and then sheep, right? And that would be correct, right? By adding just one more word here, all, right? It actually screw up, you know, it actually didn't take that into consideration. It's not enough to change the probability of the next word. So it still says nine sheep sometimes, okay? So keep in mind, language model is not very smart. All it's trying to do is just trying to predict the next word, okay?

Michael Wu: All right, so what is chain of thought prompting? Chain of thought prompting is basically you take that exact prompt, exactly what it is, okay? And then you simply ask it to think step by step. Think step by step, okay? Now, the language model will give you another response. It will look like this. It will say the farmer starts with 10 sheep. The phrase all but three ran away means that three sheep did not run away, okay? All right, so the farmer has three sheep left, he got two more, so now the answer is five sheep. And that would be correct, right? That would be correct. You know what the correct answer is, right? Okay, so now this very simple problem has four step, four reasoning step, four thinking step. It's thinking this thing in four step, right? Okay, now the interesting thing is why does chain of thought work? Why does chain of thought work? This should be something very strange, you know, because, you know, you are not training the model, right?

Michael Wu: You did not have any knowledge enhancement to the model. You're not providing it, you're not augmenting it with any additional knowledge through, you know, retrieval-augmented generation, through RAG, and you're not retraining it, you're not refining it. There's no knowledge enhancement whatsoever. Why does simply asking the model think step by step will improve its performance? Why? This should be something very strange, okay? But if you actually look at how this large language model actually work, right, they work through this recursive generative loop. And we know that if you actually write a very, very long and very, very detailed prompt, right, then the language model will actually give you a much more accurate response, right? If you write a very, very detailed, very long response, right, you'll get a detailed response. Anybody have experience with that?

Michael Wu: That's why when they teach you how to write prompt, they say, like, write more detail, you will actually give you a better response, right? Okay, so the interesting thing is that, like, when the language model is prompted to think step by step through this chain of thought prompting, it has to generate a lot more words to explain the steps, to explain every step that he's actually thinking through, correct? Right? Okay, but the thing to realize is that it actually doesn't matter if you write the prompt or the words are actually generated by the language model, they all go back into this probability distribution before it could generate the next word, okay? They all go back to this probability distribution before it could generate the next word. So the effect of chain of thought is like writing a very, very detailed and very, very long prompt, except that you don't have to write it. The language model is generating that for you, okay? Generating all the words.

Michael Wu: Remember, so whether all these tokens, these words are written by you or it's generated by the language model, it almost doesn't matter. They all go back to this distribution before you could generate the next word, okay? So that's how chain of thought improve the accuracy of the language model, okay? So now, as you've seen before with the problem with the farmer, right, it took four steps. Many real-world problems actually may involve tens of steps. Even, I would say, some of these challenging scientific problems or inquiry, they may involve hundreds of steps. Hundreds of steps. Now, if your large language model has an accuracy of 99%, that's actually pretty accurate, right? In most cases, large language models don't actually get to that type of accuracy, okay? So if one single chain of thought is 99% accurate, what happens after 20 steps? Your accuracy dropped to barely above 80, okay?

Michael Wu: After 50 steps, your accuracy is barely above 60%. After 100 steps, forget it. Your chain of thought is no longer addressing your original problem anymore, okay? So what that means is that these deep reasoning, this long chain of thought actually requires your LLM to be almost 100% accurate. Almost 100% accurate. And that's really, really difficult. Really, really hard, okay? Because we know these plain old vanilla language model, they hallucinate, right? In fact, even with retrieval-augmented generation, which you learned from a couple years ago, even with retrieval-augmented generation, the hallucination rate is typically still well above 10 to 15%, okay? So how do you get an accuracy that's almost 100%? So that's the question. How do you improve the accuracy of large language model? So language model, as we know, is essentially, it's a probabilistic model, right? So you have to model this probability of the next word given the sequence, right? And this model needs to be trained.

Michael Wu: Sometimes this training is called pre-training or fitting the model, right? So this training process basically determines the parameter of this language model, this probability distribution, okay? Determines all the parameter. Okay, once it's actually trained, when you're actually using it, it's called inference, okay? Sometimes that's called testing or scoring, right? Different people had different ways to describe this, right? So at inference, you are not changing the model anymore, okay? After you finish training, the value of those parameters of this model, they're frozen. They're frozen in, okay? They don't change anymore. And when you're actually using this language model, you simply provide it with a sequence, a prompt, right? And you go through that recursive generation loop to generate the next word, and next word, and next word, and then give you the response, okay? So keep in mind, when you're actually using it at this inference time, at test time, you're not changing the model at all, okay?

Michael Wu: Now, the question is, how do you improve the accuracy of the large language model? So traditionally, if you know anything about statistics, right, there's a couple ways you can improve the performance of a language model. One is that you use more data. You train it with more data, okay? The other way is while you train it with a bigger model. Now, both of these approach initially help, okay? But after a while, they plateau off, right? It doesn't help anymore. Turns out that data volume and model size are linked, okay? You cannot scale up one without scaling up the other. The one that you actually did not scale becomes a bottleneck, and that's why these things plateau off, okay? But if you do scale both of them, the data volume and the model size up together, what you see is that this very nice kind of power law drop off, okay?

Michael Wu: So this graph basically says that, you know, if you train it with more and more tokens, more and more words, so the data volume increases, right, your error actually decreases, okay? Meaning the accuracy increases. Error decreases, accuracy increases, right? So same thing for model size. If you train it with a bigger and bigger model, with more and more parameter, the error also decreases, you know, it's like a power law because the x-axis here is a logarithm, okay? So since the error decreases, your accuracy actually improves, okay? All right. So what about compute? So initially, compute does help a little bit, okay? And after a while, you just overfit. So not only did it plateau off, it actually gets worse in generalization when you're actually testing. When you're overfitting a model, your actual performance actually drops. It didn't just plateau off, it actually drops, right? So it turns out compute is also linked to data volume and model size, okay?

Michael Wu: So again, you can't scale one up or two up without the other, okay? So the one that you did not scale eventually becomes your bottleneck, right? But if you do scale all three of them together, right, you get this nice, again, this power law distribution drop off, okay? Error decreases, accuracy improves, right? So what that means is that an exponential increase in training resources, right, give you a linear increment in accuracy, okay? For example, if you train the model with 10 times more data and you train the model that's 10 times bigger and you train it for 10 times as long, you're going to get about 10% improvement in accuracy, okay? In reality, it's actually a little less than that, but close enough, okay? All right. So this is actually what kept OpenAI running for the past 10 years, okay? They just keep feeding their model with more data, train it with a bigger and bigger model, throw more compute at it, right?

Michael Wu: The problem that they run into today is that we are out of publicly accessible data. Every possible data sources that they could possibly use, they have used already, okay? There's no more publicly accessible. There are lots of data behind the firewall, behind the corporate, right? But those are not publicly accessible. They can't use that to train their language model, okay? Now, and the fact that these three components are linked means that you cannot just continue to scale up your compute and model size because data becomes your bottleneck, right? If you just add more compute and to train a bigger model, right, because you don't have enough data, it just won't help eventually. Eventually, it just plateaus off, okay? So, what can you do to make the language model perform better? What else can you do? So, according to traditional statistics and machine learning, there's basically nothing you can do in the training side anymore, right?

Michael Wu: Because in training side, what can you do? Anything that you do, you're still constrained by the data that you have, right? So, there's nothing you can do on the training side. What about the inference side? What they call test time, the inference time. Okay? So this is actually an idea that we actually borrow from quantum computing, okay? Is that you simply accumulate statistics, okay? The idea is very simple. What if I ask the language model the same question a thousand times? If I ask the same question a thousand times, it's going to give me, you know, answers and responses, a thousand different responses, right? Okay? And the idea is that you can now pick the best answer out of this thousand, okay? Pick the best one. All right, there's actually many, many ways to actually generate a variety of responses. And all of them actually involve turning up what we call the temperature parameter.

Michael Wu: If you turn up the temperature parameter in your language model, it actually becomes more creative. Okay? So if you turn it up, that means, and you ask the same question a thousand times, it's going to give you a thousand very creatively different answers. And you can actually pick the best answer out of this thousand, right? For example, you pick the one that's most consistent, right? Or there's some way to verify which one answer is actually best, right? And there's also many, many ways for you to determine what is the best, to picking the best one, right? You can have a human to be that judge, to pick the best one, or you can have a reward function, right? Basically some kind of model, right? Or you could have, in problems like math or coding, where you know what the actual answer should be, right? In coding or in mathematics or science, right? There is one correct answer, okay? But in those cases, you could just use a verifier, okay? A verifier could be, for example, a compiler, right?

Michael Wu: If you ask a language model to write some code and that does whatever you want it to do, it will generate a code. And if you give it that code to the compiler, and it compiles and runs, and it actually does what you tell it to do, right? Then it's a good code. So that compiler is essentially your verifier, okay? Because you know what the true answer should be, okay? All right. And another way is that you could also simply use a language model to be the judge. The language model could play the role of generating the response, and also be a judge to critique the response. And when you do that, you have something called a sequential refinement, okay? What you do is that you have the language model generate one response, you have the same language model playing the role of a critiquer, okay? And to critique it, and then the language model can actually take that critique and feedback and generate a better response in the next round.

Michael Wu: And you could just do this over and over again, do this a thousand times, right? After a thousand iterations, the answer would be really, really good, right? Because you refine it every time, every time it gives you critique. Now, one thing, whether you're actually doing this parallel generation or sequential refinement, these are what we call the thinking behavior, right? These are the thinking behavior. When it actually is generating all this answer, generating all the refinement, it's actually thinking, okay? The model is actually thinking. So now, this scaling inference time thinking actually requires a lot of working memory, okay? And to a language model, this working memory is essentially the context window, the context link. So whether you are doing it in parallel generation, right? You have to remember all the answers that you generate. If you generate, you know, a thousand answers, right? You have to remember all thousands of them.

Michael Wu: If you generate a million answers, you have to remember all million of them, right? Because if you don't remember all a million of them, what if the best one was the second one? What if the best answer was the second one, right? You would have forgotten it, right? If you only remember half million, right? What if the best answer is within the first half million? You wouldn't have known. You would have forgotten it, right? So it has to have that big working memory. And likewise for sequential refinements too, right? As you refine these responses, you have to know that after a certain amount of iteration, you have to make sure that you don't go back to like the second response or the third response, right? So it has to remember the entire sequence of refinement to ensure that you don't actually go back to square one, okay? So the trend in the market is that this context window grows bigger and bigger, right?

Michael Wu: So starting with ChatGPT 3.5 that we are all very familiar with, they have 4,000 token context link, right? Today they have 128,000 token, the V4, okay? Claude has 200,000 context link or token context link, right? And with Google Gemini, you could go up to 2 million with the pro version. And with the recent release of Llama 4.0, you can even get to 10 million token, okay? Now this market is actually very, very competitive. I know for a fact from my friends at OpenAI, they're actually working on a method to essentially enable this language model to get infinite token link, okay, context window. So infinite memory. So maybe very soon we won't have to worry about working memory or context links anymore, okay? So, but that's coming, okay? So now if you let these language model think and give them a memory, right? In order to think, you have to give them a lot of memory, then they become super, super smart, okay?

Michael Wu: They become really, really smart. And we tested this by asking these language model really difficult question. Competition level math problem, competition level coding problem, PhD level science questions, okay? And if you ask this just, ask these questions to the ChatGPT say 4.0, okay? Which is the language model, the chatbot type, okay? Not the smart one, not the problem solving one that I just told you about, okay? They actually do decently with these PhD level questions, okay? 56% accuracy, okay, okay? But keep in mind these language model have read every single book and every single published journal research paper out there, right? So it can answer those questions, but they fail miserably when you ask it to solve a math problem and when you ask it to actually write some code for you, they fail, completely fail. You know, around 10% accuracy, right, because they can't reason, can't think, right? But what happens if you let it think? Then the story changes.

Michael Wu: With the O-series of model, the o1 preview and the o1 actually released version, they actually, you know, from failing to passing spectacularly, okay? 90% accuracy, okay? And this one, you know, still it was better, but now it's not such a huge difference, but still better. But the thing I want to draw your attention to is that it's beating human experts, smarter than me, okay? So, beating human experts, okay? So now, the thing that's really interesting is like, you know, if you look at the accuracy of these model, right? So this is the performance of the O-series of model, the o1 in these competition level math problem, and if you scale the training time, if you train it longer and longer, right, they will help, but eventually they plateau off. But what they find is that if you continue to scale test time, test time compute, okay? Just generating more and more variety of answer and pick the best one or do sequential iterative refinement, do that a thousand time, a million time, right? They don't see any signs of plateauing. They have no sign of saturation.

Michael Wu: What that means is that you could continue to scale up this test time compute and the accuracy of these language model will continue to improve, so far, okay? We don't see any sign of saturation at all. So now, this is actually perhaps the most important discovery in AI for the past, I would say, several decades, okay? The most important discovery in AI is test time compute, okay? Is that the performance of these language model, scale is a power law of your test time. You simply let it think longer, right? You can actually improve its performance, okay? Very, very amazing. And the beauty of it, why is it so important is that because all you need is compute. Remember, this is, you know, whether you are generating a million answer and picking the best one or if you are doing the sequential refinement, refine it like a thousand or a million time, all you need is compute. You're not changing the parameters model. You don't need a bigger model. You don't even need more data.

Michael Wu: All you need is compute. That's why trillions of dollars actually is pour in to building data center because all you need is compute. People are investing in quantum computing, fusion energy to power this data center because all you need is compute, okay? And now everyone have access to a PhD level problem solver. Isn't that amazing? Okay, so let's put this PhD level problem solver to work, okay? So I actually recently moved into an apartment and I just needed a very simple desk. So I found one that I like. I said, well, why don't I, this looks like a very simple desk I can actually build myself, okay? So I'm very hands-on person. So leveraging the multi-modality capability of the language model, I could just ask ChatGPT how to build this table, right? And you will give me all the tools I need to build this table and every single step that's needed, right? It's from, you know, assembling, to painting, even to the end how to enjoy the table, right? But at the end, I still don't have a table.

Michael Wu: I still don't have a table, right? I can know everything about how to build a table, but I still don't have a table. This is actually where we need agents, okay? Now, this guy's coming back, okay? Agent Smith, right? So they are definitely goal-oriented, right? It has autonomy, it's context-aware, and the hallmark that makes these agents different from the previous iterations of AI is the fact that they could use tools. And using tool means that they can actually take action, okay? Because as you've seen with the example with the table, right, they could tell you everything that you need to build a table, but it cannot actually take any of those action. You have to still build it. You still have to take those action, right? But if you have an API that controls a robotic arm, it's able to control that, then it's a different story, right?

Michael Wu: Okay, now keep in mind, these agents, the core of it is still a language model, right? And it's not the chatbot type of language model, okay? It's the problem solver variety, the smart one, the PhD level smart one, okay? So it has these reasoning capability through chain of thought, and it has memory, right? Has memory through this long context window. So the way they actually work is that you know, it could come up, because it has reasoning capability, it could actually come up with a plan, and because now you have access to tools, right, you can actually call those tools to take action. Once the action is actually taken or executed, it has to have the ability to essentially observe and monitor the result of those actions, okay? So this is what we call the action reflection cycle. Right? And so after you actually observe the result of its action, you need to reflect. And reflect means that you have to evaluate the result against the goal, right?

Michael Wu: Because maybe you completed the goal, then you're done, right? But if it hasn't, if it didn't complete the goal, then you could use a chain of thought, planning and reasoning again to plan your next action. Okay? So that's actually how that works in general, right? So I find it sometimes useful to compare an agent with an ordinary computer program. So with an ordinary computer program, if I say I want a table at the end, right? If you tell an engineer you want something, what is he going to do? He's going to write the code every step to tell you exactly how everything should be executed. For example, he would say that, you know, I need to go buy some wood, I need to measure it, cut it, and then you could have branching conditionals, right? If you have nails, then you will use a hammer to assemble it. If you have nuts and bolts, you will use a wrench and screwdriver.

Michael Wu: After you assemble it, you need to go get some paint, paint the big area with a roller, and touch up the fine area with a small brush, and then you get a table. So this is a program. That's an ordinary computer program. It's written by a user, executed by the user, or other program. Other program can execute a program too, right? Okay? Now, how is a software agent different? Now, software agent is different in the sense that they can self-execute, okay? They have that autonomy. They can execute when they think it's their time to execute. So what that also means is that they must be able to observe the environment, the context, right? To see, to determine when is the appropriate time for them to execute themselves, right? Sounds kind of gloomy, execute themselves, right? So when to self-execute, right? All right? So this is still very task-specific, they're goal-directed. So it's great for automation of tasks.

Michael Wu: And even back in the early 1990s, I would say late 1990s, near the turn of the century, we already have many examples of this. For example, web crawlers, the load balancer, right? And also Agent Smith. Agent Smith is actually modeled after what we call the network intrusion detection agent, okay? So these agents can monitor the network and detect whenever intrusion happened, and you actually go and terminate those processes, okay? Looks just like Mr. Agent, right? Mr. Agent Smith, right? Okay? So now, a software agent is essentially a program that has autonomy and essentially has this context awareness, can monitor the environment, okay? Now, how is that different from an LLM agent, a large language model agent? A large language model agent is slightly different, okay? In the sense that you also tell me what the goal is, I want a table, but you also tell me what tool do I have access to, right?

Michael Wu: These are the tools that you can use, and these are the resources you have access to, okay? That you need to get. And basically, the large language model will come up with a solution, right? You may come up with this solution, say maybe it will simply go buy the wood and the paint together, measure and then cut them, and then assemble them, and then paint them with this little brush, and then you're done, okay? Or you could come up with a completely different answer, you may come up with a different path of execution, right? Maybe you can measure first, right, to know how much paint and how much wood you need, and then just cut them, and then paint the big area with a roller, and then assemble them, touch up, and then get this table, right? Maybe you tell them, I want a very, very simple solution to build this table, right?

Michael Wu: Then you would say, okay, maybe you can measure your wood so precisely so you can buy the wood in the exact dimension that you need so you just have to assemble them and paint them, get the paint, and then paint them, and you get the table. Then you're done, okay? But keep in mind, these language models, the solution they come up with depends on what tools they have access to, right? Because you have to tell them what tools they have access to. If you give them, for example, a new tool, sandpaper right here, okay? Then it will come up with a different answer, different solution, right? You will basically tell them to measure first, measure the precise dimension, get the wood, right, and assemble them, and then you want to sand them before you get the paint and paint them to get this table, okay? All right. So basically, large language model must use tool to act, okay?

Michael Wu: And in the digital world, these tools are basically programs, APIs, or other agents, right? It cannot do all the things by itself. They actually rely on other agents as well, right? So keep in mind that agent must be able to observe and track the outcomes of the action, right? So remember, agents are still limited by the tools and resources they have access to. So we just have to make sure that we provide the access to these feedback data so we can actually monitor the results and improve and plan this next action and so on, okay? So now, why is agent such a big deal? Why is agent such a big deal today? Okay, because agent is actually a major step toward artificial general intelligence, okay? Why? So you may wonder, well, I mean, artificial general intelligence is essentially AI that has human-level intelligence, right?

Michael Wu: But how do you measure human-level intelligence? How do you measure intelligence in human? Testing is not going to work, right? So OpenAI actually have a very operational definition of AGI. Basically, their definition of AGI is that, you know, AI that has human-level performance at most of the economically valuable work, okay? The ability to do economically valuable work is the hallmark of artificial general intelligence. Basically, if you have an AI that could do the work that you and I do and get paid for, and to do it just as well as you and I do, then you have artificial general intelligence. That's what that means, okay? Now, now it makes perfect sense why agent is such a big deal, right? Because as you've seen before, whether you have a chatbot or a PhD-level problem solver, they're passive, right? It could tell you exactly what to do, but you still have to do the work, right?

Michael Wu: But today, with agents, it can start to automate and do those work for you, okay? And do it maybe just as well as we do, okay? So that's a big step towards artificial general intelligence. But we all know that, you know, taking actions is dangerous, you know, because action changes things, okay? And that's why AI ethic is really, really crucial in this age of AI. Very, very crucial, okay? Now, before, whether you have a chatbot or a problem solver, you have to worry about whether the information gets in the wrong hand, right? You don't want, you know, bad actor learning how to make a bomb or create toxins or stuff like that, right? With agents, you have bigger problems to worry about. You have to worry if it's actually going to make a bomb and make toxins and spread it out, right? So ethics is crucial. Responsible use of these technology is crucial here, right? So I actually have developed a framework for responsible AI.

Michael Wu: And due to the constraint in time, I can't actually tell you in detail what they are, but I'll just briefly go through it very quickly, right? So there are some criteria that are actually table stake, okay? So if you go out and search for responsible AI, you actually see many, many, every company probably has some of these, right? A version of this. And they use words that are actually slightly different, but they basically mean the same thing, okay? So basically, the words in the parentheses here, they basically mean the same thing, okay? So privacy preserving just means they protect the data or, you know, and so on, right? Safe just means that you can have human override, resilient, you know, don't have catastrophic failure and all that stuff, okay? So these are table stake, right? You have to have it. Before you have this, no one's going to use your AI, okay? It doesn't go over that adoption threshold.

Michael Wu: So beyond these table stake requirements, the AI have to serve the user, have to benefit the user, okay? That means the AI had to be good at what it does, right? So all these words basically mean the same thing. Being accurate, being reproducible, consistent, reliable, robust means basically the same thing. The AI had to be good at what it does, right? Because if it's not good, no one's going to use it. But beyond being good at what it does, it also had to be trustworthy. If people don't trust it, right, they're also not going to use it. And being empathic just means that you actually make people actually want to use it, okay? Because it's nice, it's friendly, it's polite, right? You don't want the AI being, you know, condescending, right? Say, what are you, stupid? Right? That's not very empowering, right? So, okay? So, empathic. So once you're basically meeting the, serving the user, the AI is actually meeting the user's need, right? You pass what we call the adoption threshold.

Michael Wu: Then you have to worry about a different set of components of this responsible AI. It's that it had to be beneficial and be serving the organization and groups, right? So because within the group, you have multiple AI, multiple people using different AI, right? So accountability, you know, becomes important, right? So traceability, auditability, same thing, okay? Fair becomes really important, right? So I actually have written a series of four blogs out there explaining how do you deal with AI biases, right? So this is, you could go look at that if you're interested, right? So, but, and finally, because AI have to work with other AI, right, other agents, you have to be interoperable, right? That's actually when it actually starts to serve the organization and the group. And beyond that, it had to serve the society, had to serve humanity as well, right?

Michael Wu: So it had to be, I would say, empowering, had to be human-centric, had to respect human agency. Lots of technology out there today are not respecting human agency. Social media, for example, right, is highly addictive, highly manipulative. If you have that, it's not benefiting the society and humanity in the long run, right? So we don't want to make the same mistake with AI again, right? So you have to respect human agency, non-manipulative, right, non-addictive, and finally have to align to human value. Have to be moral, have to be benevolent, right? And beyond serving the society and human, there's still something we need to worry about, okay? It's that it had to be beneficial to the environment and also our future, okay? And all that means is it has to be sustainable, right? Optimized, resource efficient, carbon neutral.

Michael Wu: And finally, to make sure these AI are beneficial to our future, we have to make sure these AI is child safe today. A lot of us are parents here, right? Some of us are new parents, I recognize. And how many of you are comfortable letting your kids have an AI pal, AI friend, and talk to them day and night? Think about that, okay? It's not child safe today, it's not good for our future. Many people forget this, right? Okay, so that's in essence what this responsible AI framework is. So again, hopefully I'll have time to write more about it in the future. So as you can see, we talk about tool usage, right? And that's actually done, the ability to see what tools are available at this language model's disposal is through the MCP, right? Model Context Protocol. We talk about the ability to monitor feedback, okay?

Michael Wu: And because no single agent can do all the work, they also have to collaborate with other agents, right? And that's actually done through another protocol called A2A, Agent-to-Agent Protocol. And we also talk about this safety guardrail, because it's crucially important, more important in this age of agents, okay? And so if you look at this, right, if you look at this and compare it to what you've seen here at Outperform, some of the agents that we built, we don't seem to have all of these components. We don't seem to have all of these components, right? I mean, for one thing, right, we don't have full autonomy, right? A lot of our execution paths seem to be hard-coded, but why do you think we built our agent that way? Why do you think we built our agent that way? I want you guys to raise your hand if you are ready for this. Who here is actually ready to let agents make all their business decisions fully autonomously? Raise your hand.

Michael Wu: I don't expect very many of you would raise your hand, okay? Why would we build an agent that none of you is ready to use today? Okay? We want to build an agent that you can use and realize value and build trust with, and then over time, once you build that trust, you just tell us, I'm ready to let AI take over. We'll just take that human in the loop out. It's easy. It's very, very easy, okay? It's very, very simple. Okay, so now that's agent, right? How do we get to an innovator? How do we get to an innovator? Innovator is basically the idea that these are AI that can create new knowledges about AI themselves, okay? And not only can it create knowledges about AI themselves, they can also implement this knowledge, okay? And what that means is that now these AI will essentially be able to self-improve and re-innovate themselves constantly, okay?

Michael Wu: Because they don't need to sleep, they don't need to rest, right? Okay? Now, the question is how do you actually create new knowledges? New knowledges and new ideas are two different things. Knowledges have to stand the test of time, right? Ideas are easy, right? So we know that with large language model, a chatbot, a simple chatbot, can already generate new ideas that previously did not exist before, okay? We already know that. And, you know, how do we generate, the way that we generate knowledge is, you know, today, okay, we have education, a research institution, right? We have PhD students to create knowledges for us. So this is like the infinite monkey theorem. Anybody here know the infinite monkey theorem? No? Some do, okay? Some do, right? So the idea of the infinite monkey theorem is that if you let monkey type on the typewriter with infinite time, type essentially forever, eventually the entire body of work from Shakespeare will appear, okay?

Michael Wu: Simply give it an infinite number of time. Infinite is actually a very powerful concept, okay? So, of course, a monkey is not going to live that long, right? So another variant is that, like, if you have infinite number of monkey typing a typewriter, right, you know, eventually one of them will come up with the entire work of Shakespeare, okay? Remember, you have to have infinite number of monkey, right? So that's essentially what will happen if you simply ask an LLM, a chatbot, to generate new ideas for you, okay? Not knowledge, but ideas, okay? Now, so that's actually not how humans create new knowledges, right? So we don't just have PhD students running around generating ideas all on their own. What do we have? We have professors, right? We have thesis advisors. So in this case, what you need is a smart monkey, right? You need a professor monkey to tell whether the idea is new and whether it's a good idea.

Michael Wu: But guess what? We actually have a problem solver with PhD-level capacity. They can play the role of a professor, right? We have that. We can use just the chatbot to generate ideas, and then we can have this professor monkey, right? And use this problem solver as a professor monkey, right? And keep in mind, the professor monkey don't need to know how good an idea is, it just needs to know that it's better than all the existing ones. That's a much easier problem, right? You don't need to know whether it's actually some idea that's going to turn into a Nobel Prize or something like that. You don't need to know that. You just need to know whether it's actually better than all the existing ones before. And the professor could do that, right? Because ideally, they should read all the research within their domain, right? And now we have a problem solver that could do that for us today, right? Okay, and better yet, because now we also have agents, right?

Michael Wu: Agents can act and basically monitor feedback. What that means is that they could do experiments. They can conduct experiments, okay? So now we have all the components that's necessary to build an AI scientist. In fact, this is actually what Sakana AI have done, right? Everything from the ideation of a research to the experimentation to the actual writing of the paper can be automated by AI, right? So they use large language model to create ideas, they check it against scholar database, right, to make sure it's actually new, and then they score them to see which idea is actually worth implementing. Then they actually do execute this experiment template, and they actually write the code to actually do this experiment. And this experimentation script is where they actually iterate on the experiment.

Michael Wu: They will experiment, and then if you fail, they just iterate, update the plan, because you have an agent that could do this, right? You could just iterate this over and over again, and while you're doing this, you can basically collect numerical data. Once you collect enough data, you give all this data, you execute what we call the manuscript template, then it will just write the paper for you, okay? Once you finish writing the paper, you have a manuscript, what do you do? Give it to Professor Monkey, right? And then the paper is going to be reviewed. Once it's actually reviewed, it's actually going to give you critiques, what you can improve on, and that actually goes into the next iterations of your ideation, okay? So that's what Sakana AI have done. And this is actually the first paper, scientific papers, everything from ideation, to experimentation, to writing of the paper, to submission of the paper, to the journal, right? Everything is actually done by AI.

Michael Wu: And they actually got accepted, okay? So AI is able to create knowledges that pass the rigor of human peer review, just like any scientist would, okay? So that's a little scary, doesn't it? Okay? So now the question is, how do we get to AGI from this innovator? Okay? So there's a theory from Leopold Aschenbrenner. He basically says that, you know, does intelligence simply mean it's a result of scaling compute, okay? So if you look at the timeline of the development of AI, you know, over the past probably a decade or so, right? And so the y-axis here is the effective compute, the amount of compute, right? And this is actually normalized to GPT-4 level, okay? So this is actually GPT-4 level, so it's actually 10 to the zero is one. So each one of these is logarithmic scale, okay? So each one of these steps is 10 times smarter, 10 times more intelligent, okay?

Michael Wu: So way back in 2019, right, we already have GPT-2. That's actually like a preschooler, okay? So if you, anybody actually played with ChatGPT back then? I have, okay? If you talk to ChatGPT back then, you feel like you're talking to someone who's drunk, okay? It could kind of make sense, but not really, okay? And it's not very intelligent, okay? And a few years later, in 2020, right, you get ChatGPT-3, right? And that's actually able to make, you know, well-grammar sentences, right? They are essentially like an elementary-level student. Now, this model is here, you can see, is 100 times more intelligent than the previous model, 100 times more intelligent, right? So when you get to 2023, GPT-4, this model is like almost a thousand times smarter than the previous one, okay? And today, in 2025, we have the O-series model, right? These are no longer the plain chatbot type of model. These are the thinking model, okay?

Michael Wu: The reasoning problem solver model, right? The O-series model. They are essentially at PhD level. Again, this is actually more than a hundred times smarter than the previous model, right? So today, we are just at the beginning of able to do what we call automated AI research. Remember what Sakana AI did, right? We're just at the beginning of that. And what that means is that you will essentially get to this intelligent explosion. Intelligent explosion, right? Because if AI can improve itself, that means it will get smarter all by itself without human intervention, right? That means the intelligence level will basically just blow up, right? So this is actually how we are going to get to artificial general intelligence and artificial superintelligence, okay? So now, think about this a little bit. If you simply apply this automated research and this knowledge explosion to every single domain, right?

Michael Wu: Whether it's physics, chemistry, medicines, or aviation, or manufacture, or whatever domain that you do, right? That you work on, right? Then the AI will eventually be the AGI in your domain, right? You basically will get AGI, because if it's able to self-improve in whatever domain you choose to, right? Eventually, you will get AGI. And remember the definition of AGI, right? These are AI that could do work that you and I do today and get paid for, economically valuable work, and do it just as well as we do. Human-level performance at economically valuable work, right? That's definitely foreseeable, given what you have learned now, okay? So now, the question is, what's after AGI? What's after AGI? To be honest, no one knows what's after AGI. I don't know it, right? But one thing I do know is that electronic circuits and machines just scale better than biological systems, okay?

Michael Wu: So what that means is that once you achieve AGI, once you have artificial general intelligence, you simply let it sit around, right? And do its work alongside you, okay? Artificial superintelligence will soon follow, just simply due to faster learning and better scaling, okay? So if you have an AI that could do the work that you and I do today that we get paid for, right? And could do it just as good a job as you and I do, right? Simply let it sit around, right? Because it can learn better, scale better, doesn't need to sleep, right? Doesn't need to rest, okay? Let it sit around for a couple months or, you know, a year. It will be better at what we do, in everything that we do, okay? That's scary, isn't it? That's scary. Now, it makes you wonder, like, what qualities makes us uniquely human? So maybe it's morality or ethic, right? Because, you know, AI actually had never been a member of a society before, right?

Michael Wu: But very soon, they will be an integral part of our society. So they can actually learn our ethics and our morals as well, okay? Maybe it's, you know, feelings and emotions because AI don't have what we call chemical senses or integration of those senses. But it's not too difficult to actually attach a chemical transducer to an electronic circuit. I mean, for example, we have breathalyzer, right? They could detect alcohol level in our breath, right? You could, if you have chemical transducer for many different kinds of chemical, right, attached to an electronic circuit, then the AI will actually learn how to integrate those senses. Maybe, and maybe, still maybe, it's empathy and compassion because one thing that I know that AI don't have and probably will not have is mirror neurons in their brain. So they can't feel what others feel from interacting with them, okay?

Michael Wu: So now this is The Matrix moment that we're talking about. As Morpheus have said, I can only show you the door, but you're the one who have to walk through it, okay? So now, in front of you, all of you have a box with a red pill and blue pill. I want each one to take a red pill and blue pill, okay? I'll do the same. Take one out, okay? Hold it in your hand. Okay. Ready? Now, you have a choice. You have a choice. You could take the blue pill, right? In which case you will go back to your life as if nothing had happened, okay? Everything that you learn here, everything that you hear from me will feel like a dream and sound like something out of science fiction, okay?

Michael Wu: You will wake up the next day and go to work as you always have been, and nothing will change. Nothing will change, okay? Now I want, or you could take the red pill and join the AI revolution. Okay? I know that's scary, but don't be, because myself and the entire team of pros is here to help you navigate this brave new world of AI. We're here to ensure that our AI will help you and is going to help you outthink, outpace, and outperform. And I think, and I hope, that you take the courageous choice and follow me, because I'm taking the red pill.