Cataloguing the Meta of Vibecoding
Blindly experimenting our way into the future, one vibecoded experiment at a time
I'm pretty frustrated with the state of meta information around AI development frameworks right now.
I've spent the last few days really deeply wrapping my head around Claude code, what it was, how it worked, and how it actually differed from the prior crop of AI development tools. All I knew was that it worked in the terminal as a CLI tool, and that felt really intimidating when I was used to coding in my cozy IDE with a sidebar chat UI. I was surprised to discover that using it well was even more foreign than I expected and involved an entirely different approach to how I've been approaching development with Cursor.
You might think I'm a little bit late to the game here (actual MONTHS behind the curve) to only now be experimenting with the CLI agents, but to be honest, I'm just frustrated with learning new frameworks only to have them invalidated a short while later. I've been a bit resistant to learning a new terminal-based tool for the last few months even while my pal Seconds was raving to me about how good the latest SOTA was, specifically because I've fallen for this “get hype on the new AI tool” pattern over and over at this point:
I got really good at model prompting in 2020/21 just in time to have that interaction style be largely deprecated by ChatGPT-style chatbots.
I spent a bunch of time figuring out how to get RAG DBs going just a few months before 100k token context windows rolled out alongside dynamic directory scanning tools that gave us a lot more context budget to use.
I was just starting to make sense of tool definitions and function calling in the model instructions just as frameworks like Cursor started getting good enough that you could skip over some of the default frameworks there.
I got really good with using tools like Cursor last summer, just a few short months before agent capabilities started heating up enough that you can almost take your hands off the codebase completely. At this point I do surgical Cursor edits or manual tweaks and let the agents do the heavy lifting on my projects.
The most frustrating part of building cutting-edge skill in the AI space is that if you drop off for even a few months, the meta has already moved on without you. It’s tempting to just throw your hands up and check back in every few months to see how much closer we are to a world where you can just tell the robot to go off and build you your thing.
One of the things I've been working on is trying to find just the right architecture prompts and frameworks to sufficiently constrain and create the guardrails for an LLM to be able to solve its own problems and execute on the code in a consistent and reliable way. This isn't a new problem space for me. I've been puzzling over this problem for five years now: In late 2020 I was doing experiments to try to see if I could iterate through a templated novel framework in chunks to try to create internally-consistent novels of 30,000 to 100,000 words using an LLM with a sub-4k token context window.
We've come so wildly far since then, and one of the weirdest things about the current state of AI right now is how comparable it feels to that early-pandemic 2020 period when I was bored, stuck at home, and had first started experimenting with the recently-released open source model of GPT-2, trying to figure out whether I could get it produce commercially viable writing or only get it to spit out short snippets that kind of sounded like human-intelligible writing for more than a paragraph or two. That spring, the GPT-3 beta dropped to the wild and basically changed the world forever. Prior to that, nobody was actually sure if "AI" would ever be able to generate human-quality freeform text.
Back then, the potential that GPT-2 teased was absolutely tantalizing. It was so close to actually being useful and so close to delivering on the promise of actual human-quality writing, but it still wasn't quite good enough to just chat with. It still wildly hallucinated, still made-up nonsense phrases and sentences that sort of looked like they should make sense but didn't. It couldn't stay on task very well. But god damn, it looked like being on the precipice of actual practical magic. And as we now know, it was! That's how AI coding feels to me right now. It's so close. All of the elements are in place. It even already does a good job in limited contexts.
It feels like we're just a stone's throw away from actually delivering on the promise of actual effective production-ready self-directed problem-solving, and if we can just get a small bump up via either nailing exactly the right framework to guide it or seeing the model just ever so slightly improve, we're going to arrive at the beginning of the ASI end state that depending on your p(doom) predictions will either usher in the best of all possible futures or the literal end of humanity. Maybe GPT-5 and the next generation of models are the nudge over that line that everyone is teasing, and maybe Sam Altman is just doing another round of profit-raking puffery because he's running a business. But in the meantime, all of these attempts to learn to use these tools better in public are fundamentally a shared human effort to find just the right set of incantations and ingredients to carry us over that precipice with our current level of model capabilities.
My primary gripe right now is the same one everyone has, which is that no one seems to know how to do it. This post is my attempt to explore some of what I'm seeing in this space right now and document some of what I'm hearing is the latest state-of-the-art in AI software development as of the end of July in 2025.
What Are We Really Trying to Do Here?
The goal and the gold standard are the same today as they were in 2020 when I was trying to pitch my AI co-writing startup TextSpark: ideally people just want AI to be psychic and to successfully deliver the right results without having to be specific, make decisions, or do very much at all. That is, the better that the AI tool is at intuiting and delivering exactly what the user was hoping for (no matter how well or thoroughly they expressed it), the happier people are with the results of the AI output. Like the Tsundere-flavored eGirl or the cowardly manchild, people everywhere want the AI to simply know and grant their heart’s true desire no matter how poorly they manage to express it.
The problem is that we don't have that yet.
It's true that with a lot of setup, some tweaking, finagling, and careful guard rails, you could get the LLM agents to do what you want a lot more effectively than if you just tell ChatGPT what it is you're looking for or trying to do. But this is hard and requires thought and effort, and almost nobody wants to do the painstaking work of setting up project-specific context for themselves via a painful series of trial and error attempts you have to use your own judgment to evaluate. If the world were full of people who were willing to roll their sleeves up and get their hands dirty on solving problems instead of a bunch of lazy slackers who just want to tell someone else what to do, it would be a much different reality that we live in.
But c'mon. You know you'd hit that easy button if presented with it, which is why you're using AI in the first place. It's okay. It's just called being a human. We all want to make a wish and experience delight. "Wait, I have to work for it? What? Bullshit! I was promised MAGIC and ROBOT SERVANTS."
The only reason that anyone bothers to put together these byzantine frameworks and meticulously constructed prompt frames and careful crunchdown analysis of directory structures and guide files and rules and hooks and all of the other little things we do to try to do to make the AI behave is that it doesn't yet do exactly the thing that we want when we tell it to go do something. We're all hoping for a genie when what we have is a next token predictor, and tools like Replit and Cursor and NotebookLM (and TextSpark) are all different approaches to building context-relevant genies so that the next time we have a wish we want to make, maybe the robot actually get a little closer to the latent promise of Real AI Magic.
As it turns out, you can get quite a lot done just by cleverly handing Token Predictors specialized chains of text, provided you have enough scaffolding. Specialized command words + machine-interpreted scaffolding is exactly how we code, after all. Code is words. Words are powerful. Everything you do and interact with started with words, and every developer, politician, and business owner knows that if you manage to string your words together in just the right way, you can reliably make people rich, move mountains, and win wars.
But because our industrious little robot servants are only able to tackle one task at a time within the context of their prompt and only invoke the tools that we create for them, we have to construct these meticulously architected frameworks to empower it to go do those things (or to try to do them, anyway). The ultra-nefarious and agentic ASI of science fiction creates grand plans and executes them at the international scale with deep subtlety through the manipulation of human agents and a keen and real-time reactive understanding of the precise levers needed to anticipate and neutralize any hint of resistance.
Meanwhile, we can barely get our cutting-edge SOTA agents to put together anything more complex than a one-page web-app without it getting stuck in an endless cycle of confidently lying to our faces about how it's 100% sure that it absolutely fixed the problem it just introduced for the third time. At least Claude Code doesn't start crying and berating itself like Gemini does, I guess.
I'm being a little bit glib here because the reality is that the latest SOTA on the agents is really, really impressive, actually. You can point the CLI agents at an existing codebase and the open issues it contains, and they’ll independently analyze the codebase and open PRs that will actually resolve the problems in a matter of minutes, provided you have a half decent development framework, a well-organized codebase, and a strong set of agent rules already set up. It's crazy to see it in action and more than a little unsettling for anybody who works in software.
So why are there so many people who pick up these AI coding tools, play around with them a little bit, get frustrated, and conclude it's a toy with no real utility? You can’t get what you want and so you conclude that everybody else must be smoking crack or being lying Grifty McGriftface liars because you can't get it to work for your process.
“What are they doing differently?” you wonder. “Am I just dumb? Out of touch? And how do I get MY magical robot easy button to grant my wishes like that guy on the YouTube video who says he made $50,000 dollars vibecoding apps last weekend? My ideas are WAY better than that schmuck.”
This is in part because we lack a good and clear theoretical metatheory and strategy for AI-enabled development that's clearly stated anywhere so that people can learn to pick it up when they first get interested. If you’re experimenting in this space, you know how overwhelming it is to try to stay on top of trends or use the new tools (which literally drop weekly, sometimes daily).
There are a million semi-legit hackers and content creators for a million specific applied AI use cases vying for your attention across YouTube videos and Github tutorials and half-baked X threads, a million more grifters lying to you about what’s working for them because it might make them a buck, and there's a ton of the blind leading the blind (or at least the half-informed leading the half-informed). Additionally, since the game changes constantly and every communication channel in existence right now is becoming a walled garden (ever since Elon privatized Twitter and promptly killed the information commons by shutting down links), it's really quite hard to discover and share the vLatest-latest-v4-final-realFinal-latest-new best practices around AI tool practices, or even to know where to look for discussion about it if you don't already have dedicated thought leadership sources for these topics.
God bless Zvi, am I right?
It all reminds me of when I was getting started building out my writing business and I was reading tons and tons of author blogs from people presenting themselves as successful authors all the time, trying to figure out what actually worked to make money and make a career out of writing books. It wasn't until I got invited to a secret private forum of real authors running real businesses making actual serious money sufficient to sustain themselves on royalties that I realized how much bullshit was actually being peddled as sage wisdom by all of these people trying to fake it until they made it and look cool and authoritative. We don't actually know what works well yet, there is no established authority or best practices, and you just gotta do your own experiments in this space to see what works for you.
So it's all word of mouth and black magic and "trust me bro" and first principles experimentation right now. Do you really trust that shady Github repo with the Chinese characters that Google so thoughtfully auto-translated for you? Your best AI pal said it works great and like, totally didn't infect his machine or steal his crypto (yet). It's the wild wild west, man.
Anyway, Eury told me Substack is where the intellectual scene has moved to, so I unlocked my account and I'm back writing on Substack today for exactly that reason. I’ll do what I can to share what I know and hopefully pierce the veil a bit, although I admittedly am just another semi-informed voice like everyone else out here at the moment. Don’t take my word for it.
What Actually IS the AI Coding Meta Right Now?
Hell if I know. Mostly I read Zvi's roundups these days, experiment with new tools as much as I have time and energy for, and watch what the people hype about dredging up weird, shady hacker Discord tips are doing and saying. Primarily that means I pay attention to the occasional Twitter comment and ask Seconds what he's been up to lately. I can tell you the meta as I understand it, but this is just mostly what I've been doing and what I've been told works well. I want to caveat this by saying that I haven't actually been shipping lots of experimental apps to production or making any money vibe-coding 3D video games with Three.js on Twitch, although I have gotten pretty handy at vibe coding custom Chrome extensions and built a bunch of fairly complex LUA apps for a classic WoW server last summer with no prior LUA experience. It was a blast. I felt like an actual wizard, in-game and out.
What's weird is that for all of these people doing all of these crazy experiments, it still feels like mostly vaporware coming out of vibe coding today. Yeah, there's real use cases, and I've seen it fix real production bugs first-hand, which is very cool. But there's not actually a renaissance of incredible businesses coming out of this as far as I can tell. Maybe I'm wrong about that, and maybe we're just a bit too early. I can at least tell you that as soon as my vibe-coded apps get sufficiently complex, you need to actually understand what the application is doing and how it all fits together to be able to direct the AI well. If you've entirely vibe-coded your way to that point, it's incredibly easy to get stuck in a hopeless death spiral of not being able to troubleshoot your own issues and giving up because the AI can't fix it either, which is where I'm guessing that most people are landing right now.
This is a new space, and it's extremely confusing trying to understand what other people are doing and what you might be missing when everyone is just talking about "using AI tools." As usual, nuance exists and you don’t know what you don’t know. I feel like it’s almost daily I see someone complaining about struggling with their AI tools and come to find out that they have no idea what a rules file is.
Furthermore, people's approaches are actually incredibly unique to their personal idiosyncrasies and projects most of the time, and they have entirely different production pipelines even while using the same conceptual framework. How do we collectively figure out what actually works well and how to apply it to our own problem space in this environment?
Well, we can start by stating some basic definitions and just saying what we see, however imperfectly, which is the basis of all information sharing when the whole world is a blind man elephant problem.
There are three major approaches to “AI coding” or “vibe coding” I see when I look around right now, although realistically everyone is doing some blend of each of these as we fumble around together:
1. Single Tool IDE Purist - Stay entirely inside of Cursor, Windsurf, Copilot, Replit etc and mess with your project files and code directly within the context of that tool. Maybe try different models, but stick to a rigid, specialized tool framework that fits the needs of your project.
2. Bespoke Multitool Nerdhacking Cobblers - Just use whatever works, man. Multitool, all the tools, hop between them fluidly. Yolo. Figure it out, use anything and everything, and make it go. Trying to be consistent or defining rigid structures is for chumps. Reproducibility and success come from pure reps.
3. Terminal Agent (now with Swarm Mode) - Let the robots be free. Let the robots be legion. Set up your CLI agent in a box, define a bespoke rules structure with as much or as little rules complexity as you like, and just tell the daemon what you want it to do. Review the work when it finishes (or just tell it to deploy). Who needs to understand how to code, or even a VM for that matter?
Let's break them down one at a time in more details, talk about who they’re useful for, and discuss how they work.
Single Tool IDE Purist - "The Old Ways Are the Best Ways"
You lean in hard to one primary AI-enabled IDE with agentic capabilities (or plug in to your preferred IDE with something like Copilot) and let it help you co-create a project within the context of that application. These are the Cursor, Windsurf's, Replit's of the world. You actually go into individual files and watch it change lines of code and create new files in real-time, either giving it permission for each step or (doing what most people seem to be doing while pretending they aren't) just saying "yolo, fuck it" and letting it do whatever it wants to iterate toward something that you might look at later if it doesn't work exactly the way you expect. The point here is that you keep a file view open and keep your code views open the whole time so that you can actually see what the agent is doing step-by-step while it does it.
This approach does have you define varying levels of rules and guidance for the framework within the context of the tool that you're using. But ultimately, you're relying on the tool itself to do most of the heavy lifting of specifying the interactions and the framework. There's a series of specialized tools, agents, and system prompts built into the assumed working environment. While you can special-purpose these to some degree, it's a more rigid and less flexible approach that will privilege some types of tasks and projects over others just because those are the patterns that it was built for. I'm writing this post inside of Cursor right now, but it's never going to have the same types of tool-enabled predictive editing bells and whistles that a dedicated writing tool would if it were fully leveraging AI, just because that's not what Cursor is built for out-of-the-box.
While actual developers are likely to feel like this type of approach is more similar to the development that they're most comfortable doing, I think relying on this level of perceived control (of watching every line write into your IDE) is ultimately likely to be a crutch that holds them back in a world that gets more and more comfortable vibe coding. Increasingly, approaches that favor multitasking and genuinely capable, self-directed agents with a properly defined scope are going to become good enough that they massively win out on speed, and you'll only pop the hood to look at code when you really, really need to.
Furthermore, this approach actually gives you more of an illusion of control over the agent than some of the other approaches because so much of the tool definitions and system-level prompts are abstracted away from your ability to edit them directly. Instead, they're the secret sauce that's built into the framework of the tool that's being sold to you. Just like you don't have access to ChatGPT's chat interfacing prompt by default, you don't have the ability to hack in and tweak the tools and functions that Cursor's LLM invokes for its specialized capabilities to be really, really good at understanding your code context.
The trade-off, of course, is that you don't have to set them up, and often, you can use multiple LLM flavors within the same framework (choose your own model). Cursor works out of the box for what it's supposed to do pretty damn well and you can point it at Claude or Gemini or GPT as you like. Replit actually will make you a one-shot SaaS app. Both of these are much closer to the promise of "it just works, bro" than trying to set up bespoke CLI frameworks that work exactly the way you need them to within the context of your weird hobby project from a terminal.
Bespoke Multitool Nerdhacking Cobblers - "Who Needs a Framework?"
I was talking to a guy on Twitter (sorry, X, whatever) today who was explaining this approach to me, and I think that anyone dabbling in this space right now is doing a little bit of the multi-tool nerd hacking. This is basically figuring out what you want and then just grabbing whatever tool makes sense at the time to get it done. A lot of this is doing what feels right in the moment depending on the task at hand, and it probably works best for experienced generalists who have some basic idea of how projects fit together, what the SOTA AI capabilities offer and how they differ, and how software development life cycles and roles work in general. They basically direct the various accessible AI interfaces and pick and choose from them in the same way that you would run a small business with an agency development model.
You might sketch out a build plan by chatting with Opus or O3 and then manually copy-paste that into a Cursor directory to re-hack it in agent mode into an actual set of starter files. Then you do some manual library installs and some light sketching out of project guidelines and crunch it all together into a new format from some random guy's experimental Github CLI project that's been downloaded 200 times. Maybe you outline some sub-agents for Claude CLI to use, or maybe you don't. You definitely don't have a rigid framework or a custom set of reusable hooks or rules that you invoke across projects and across contexts.
This is the approach that requires both the most personal skill and gives you the most actual control over what's happening in your project. You're handling most of the design implementation and testing tasks yourself and treating all of the LLM tools like very junior specialists that you're either directing or riffing with, but you're not giving them much direct agentic control in your project. While this is a very safe, very effective, and very powerful approach to building projects using AI, as well as the approach that looks the most like conventional software project development, it's also the least accessible to most users. You really do need deep expertise across the entire stack of business and project functions, as well as deep knowledge of how to engage with cutting-edge AI tools for the best results, to build and launch experiments this way.
For people who can actually do all of these things, this approach absolutely offers the most possible alpha right now. But if you're able to and motivated to do these things, you're already doing them. What this approach doesn't do is deliver on the promise of the vibe-coded personal Minecraft with your own dream features. If the promise of AI is the genie who grants your wishes, this approach is a weird in-between step toward that, where the awkwardly constructed homunculus goes and does the master's bidding at varying levels of capability, and the master has to make do with the results.
There will always be people who prefer this approach, wanting to get close to the metal and rolling their own. But it seems unlikely that this is going to emerge as the meta for how to approach AI development. And I certainly expect either more specialized or more generalized and agentic tools to actually win out in terms of development mindshare over the next two to five years. Realistically, that's already happening.
Terminal Agent with Swarm Mode - "We Are Legion"
This is the new meta, and it's the thing I've spent the last few days wrapping my head around this week. You basically take a CLI terminal agent like ClaudeCode or Gemini CLI, give it an instance branch of your Github project to spin on and a specific job to do, and set it loose. It does what it's supposed to do, troubleshoots its own problems, and when you come back to your terminal a few minutes later, you theoretically have a working piece of code that solves the problem you gave it. While the tools can help do a lot of setup bootstrapping in the form of slash commands that will help you set up pre-established concepts like MCP connections or sub-agent frameworks, these tools feel much more open, experimental, and bespoke to me than the more rigid tool-based frameworks (by design, of course), and there's no clear "best practices for how to use them" in a broad or general sense.
The only thing that's close to an emerging best practices framework for using these tools is the idea that you should use them concurrently and have multiple agents running at the same time on different tasks so that you're not bottlenecking yourself by trying to take a linear approach to something that could easily be nonlinear. I assume you've read The Goal? You should, if you haven't (or at least make an AI agent go do it and tell you how it applies here). You can spin up something like Claude Conductor and have 3-5 "developer" terminals all spinning on their own tasks simultaneously and doing their own troubleshooting and problem resolution while you chat about strategy with a different agent and only pop back to see how their requests are going.
These terminal (CLI) agents are the same tools that seem to be operating under the hood of the in-framework agent modes of the single-framework purist tools like Cursor, and as the direct-from-provider agent tools get better and better, the need for really complex middleware scaffolding like that becomes more and more questionable. Especially as those tools get squeezed out on profit margins by the direct providers they're trying to white-label for their specialized use case, it's going to feel increasingly difficult to justify a substantial secondary monthly fee for what's essentially a rigid and finicky UI layer. There are no non-legislative moats in AI-enabled software development, even for the unicorns.
The really galaxy-brain realization here is to point out that all interfaces are designed to give humans an alternative way to interact with systems in a scope-bound and predictable fashion, since we can’t just talk to machines. Spoken language is our earliest and most universal I/O system for getting what we want from the world, and literally nothing is more Lindy than the tool you first mastered to ask your mom for food when you were hungry. "Call and response" patterns are the real alpha and the omega. What this means is that you’ll always prefer speaking to manipulating dropdowns and typing in text if it delivers the right result reliably enough.
Half of this post, in fact, has been written by hammering a few keys on my keyboard and speaking into a microphone while letting AI middleware handle the interpretation and formatting of the dictation I'm giving. You don't understand how magical it feels to be able to do this as an experienced author who's been working with far worse voice-to-text tools like Dragon for a decade. Six years ago you couldn't read in a sentence without awkwardly saying, "New line, open quotation, this is how we used to do this, comma, if you can believe that period, close quotation," and hoping it just got all of the text right and remembered to capitalize sentences. Using voice to speak to agents like this feels so incredibly natural and fluid. If you haven't tried it yet, give it a shot. WisprFlow is the best tool I've found to do it so far, and it's an absolute game changer if you're still typing text in to direct your agents or chat with Claude.
Terminal agents that can swarm or invoke sub-agents and swarms of their own (which are sets of specialized agents working together on subtasks) are a much closer step toward a future in which we can make a wish to the genie and have it deliver it to us than the other approaches, and you can still pop the hood and open the project directory whenever you want to take a look at it (in Cursor, for example). That's mostly what I've been using Cursor for now: surgical alterations if there's something specific I want to change, usually in documentation, which any IDE would be fine for.
Directing swarms of agents via voice I/O really feels like the next step toward where we're eventually going, and even though the experience still just isn’t quite working as well as you’d hope yet, it seems like we'll all be product mommies soon if this approach continues to grow in popularity, as it seems very likely to.
Falling disappointingly short while being tantalizingly close to the target, of course, is the hallmark of the entire range of AI coding approaches right now. It’s all so very close to “being there”… but still not quite good enough yet.
> Half of this post, in fact, has been written by hammering a few keys on my keyboard and speaking into a microphone while letting AI middleware handle the interpretation and formatting of the dictation I'm giving.
Now I'm wondering; what software *are* you using to dictate the post? (I've tried iOS/macOS's own dictation feature before, but it's very bad and annoying because it leaves out almost all the punctuation.)