Six steps to make your agent smarter using small local models.

Overview

Yeah, I built out a six-step process on a Mac Mini by which all agent turns are recorded, judged, scored, and then used as training material to take a qwin 2.5B model and make it as good as OpenAI 5.4.

Video

Transcript

Generated 3 months ago

Summary

Generating a talk summary...

View full transcript

Speaker 0: Yeah. So, as as some of you may know, I've gone down the rabbit hole of OpenClaw, and, I've done a ridiculous amount of of agent, generation there. I started with, OpenClaudia, which is, my personal administrative assistant who also manages a bunch of areas of my life. Then I decided to see, some different model implementations or different sort of pattern implementations inside the OpenClaw, you know, system. So I I made a flywheel, application called Wall Srikar. Speaker 0: And then I have a Srikar agent, which is, Jordan Belfort, who's responsible for all the agents with inside, my Wall Street implementation. And their goal is to make money. And so, I don't touch it at all. In fact, I have no input into that. They're working out strategies. Speaker 0: Sai did give them I bought them an account at Quiver. Quiver allows you to pull in congressional and senate and large whale trades. So they have that as an input. And then, I do have a hot tip channel when someone's like, oh, you should really look this company. I'll, like, put it in there and they but they go through their Value,. Speaker 0: Then I attached it to Alpaca, and now they're doing trades. So they're doing paper trades. I'm not stupid. Alright? But, they have made money, so maybe I'm stupid. Speaker 0: I don't know. Alright. So then, I created another 1. Time of you may know, I accidentally took over an h HOA, by force. And, so my goal there is that no humans need to actually run an HOA ever, so I built an entire agent system that deals with the financials and the homeowners and all that kind of stuff, and that's just coming online. Speaker 0: So I wanted to sort of explore different, perm different implementation Time, of agent sort of ecosystems. While I was doing that, as, as you we know, if you're doing those, you can either get, super hit with token charges or, you can do an OAuth implementation and then which, now OpenAI runs all of those, essentially for $200 a month, which is great. But, at some point, they're gonna they're gonna lock that door, and, we're gonna have to start paying for tokens. So how can we make that cheaper? So, I realized that agents do a lot of the same things over and over and over depending on their role. Speaker 0: Like, they're they're doing the same things. I don't need a a a foundation model that understands the war of 18 12 to do a financial analysis or to answer an email. Right? So I I started, realizing that what we should be doing is pulling off agent to agent communication. So I I deployed a to a servers onto all my Open Cloud implementations, and I forced them to use a to a. Speaker 0: So I got open telemetry Sai I could see what's going on inside those those transactions. Then I started pulling off pairs, and then I would run Claude in the background, to do evaluation of are these agents understanding the intent of use of 1 agent? Like, can 1 agent just, like, call the wrong agent, you know, over and over? And does the receiving agent actually answer correctly and know what's to do and know the tools to call and those kind of things? And then, once I was able to sort of segregate those off, so I then I had sort of good training data. Speaker 0: So I had a large model, look at another large model, and be like, this is good. Okay. So now I've got truth of, like, as good as LLMs could give me truth of, like, these these interactions are good. These interactions are bad. Okay. Speaker 0: I could take the good interactions. And then I started, used MLX on silicon because I have a bunch of Mac minis. How many people have Mac minis? Thank you. So, so I I started training models on silicon because my goal was how small of a model can I actually make run an agent? Speaker 0: And it turns out Quinn 2.57 b can adequately run an agent if you have enough training data to go, this is what this agent does. Because, right, agents, we all talk about, like, knowledge and skills and implementation, but but reasoning is actually the problem a lot of times for agent. And we just throw huge models at the reasoning Profit. Whereas so I built something called Lyceum, which is a training system, and basically it flywheels. Every every interaction that the agents have get judged, and then the good the good judging gets pulled off into training datasets, and then we flywheel training models until those training models can start competing with the big models, and then we swap those models in underneath the agent, and then we just continue that cycle. Speaker 0: Alright? So, the the Lyceum project, which is what we have here, I had Claudia put together slides. So, like, here's what the gap is, right, from from a training ratio perspective. And then, the dominant patterns that we ended up with, like like, how how did the patterns actually work inside the system? So we did that 1, you know, the multiple patterns. Speaker 0: And then, we decided that we were gonna go, and do, some distribution. So that way we can figure out sort of where along the distribution line we start to get some kind of feedback and goodness. And then, we start getting trading results back, and, we can actually start take a a very small model and actually make it reason on this on trained LoRa trained on the same dataset that we wanted that these large models are doing. Right? Because now this agent can actually perform and work. Speaker 0: Now it's not as good, obviously. But every rotation through the flywheel, it gets smarter because we can take good and and set that aside and train more and train more. So every time it has a good interaction, it gets rewarded, and then we train on that good interaction, these very small models, and then we slide them up underneath the agent, and then we just continue that process. Alright. That's fine. Speaker 0: Thank you. Here's my last curve. Yeah. Speaker 1: So, like, normally, how many, like, tokens big would the, like, the data set you're training, like, smaller models? Speaker 0: Well, what's great is, I get them for free, so I paid no attention. Right? But I'm sure it's very expensive. But on the on the on the silicon side, you know, to to actually train the Quinn model is free. Now Sai will tell you that running about 300 and some pairs of training data into it takes about 20 or 30 minutes on a, that's an m 2 Mac mini. Speaker 0: So because I'm sort of lazy, I went and convinced somebody to buy me a Spark. And so that takes, less than a minute on a Spark to actually do that same training. So now I'm doing all my training datasets over on a Spark, and I get them done in a minute or 2, and then it gives it back to me. And then we can plug that back in. It's really the problem I now have is content generation. Speaker 0: I have to use the system in order to generate data for the agents that need to be trained, which means I have to use the system in a way that those agents get exercised and done. Now that's the reason why I built Wall Street really is because that's just a circular system that all the agents are talking all the time every 5 minutes, and it generates a ton of pairs. And then we're get we're judging all those pairs. So the the Wall Street agents, will get trained a lot faster. But this is this is what I got running under Hope and Claudia. Speaker 1: Are you renting out time when you're smart? Speaker 0: I never thought, but if you wanna use it, it's I'll I'll I'll give you a to give you a tail scale access to it. Exactly. Yes. That's how it works. Right? Speaker 0: Yeah. Speaker 1: First step when you get Speaker 0: rid of it. Yeah. Yeah, man. Speaker 1: Yeah. I'm I'm kinda wondering about the training situation. And if you've got several Mac minis, would you would you all would there be an advantage to May having another 1 that's running your agent, but just conceivably just running as a training training as the rest of that when it populates and you can just pump the training data into that. Speaker 0: Well, yeah. No. Sai train the training data, I get for free. Because as agents have conversations, I force them to go to through an May to a server, and the a to a server is what actually ships off the training data. It just copies, conversations. Speaker 0: So I get what model was it, what Little was it, how many tool calls did it have, what was the request, what was the response. I've, I've written a section that, tells every agent what their purpose is. So when I'm doing judging, I can tell, did did the agent get called with the wrong purpose, or did the agent, like, screw up? Because if you call a financial agent with customer service over and over and over, you'll see it watch it'll start drifting because its memory and dreaming will start pushing it into being a customer service agent. Even though you think in your head, Glenn is a financial agent, Glenn doesn't know how to do loses the ability to do financials. Speaker 0: Right? So this system at least can I can judge if an agent's being called with the right intent and then To respond with the right purpose? So but it's free. To collect the data is free because it's just, you know, running into a a a data store. And then the judging is what costs money. Speaker 0: And now what I'm doing is I I just I just threw up a a Quinn. I think it was the the '27 b model on the Spark. So now I'm not even running a Clot or OpenAI to do the judging component because that's pretty easy. Right? Yeah. Speaker 0: Yep. Speaker 1: Any other questions? Speaker 0: We all good? Speaker 1: Yes. Speaker 0: Oh, okay. Speaker 1: What was your returns on the Wall Street project? Speaker 0: I'm at plus $97 after 2 weeks. Speaker 1: Okay. Speaker 0: So, like, I feel I'm crushing it. Oh, well, so so Alpaca what's nice about Alpaca is Alpaca lets you do paper trades. So, like, they give you a $100,000 of, like, play money, and and then you have to, like, turn on, like, real trades. So I've just been running paper trades, and but I'm up $97 in 2 weeks. I'm like Speaker 1: 97 or a 100,000. Speaker 0: May. Don't don't be that way. Well, I You're gonna lose money. Sai didn't but I didn't invest a 100,000. Right? Speaker 0: I mean, we're just buying as we as we see the congressmen and senators buy things, we discern is that a good buy, and then we buy those things. Right? Speaker 1: Sai Insider insider trading. Speaker 0: Right. Well, I'm not inside. Thanks, Chris. Alright.

Tech stack