This week I couldn’t shake an uncomfortable question while watching an agent close a PR for a colleague who hadn’t touched a single line of code: at what point did we go from programming to simply approving?
This isn’t a complaint. It’s a genuine question I think is worth asking before someone else answers it for us. Because if there’s one thing I’ve learned in 15 years of development, it’s that the big transitions in this industry don’t arrive with an announcement — they arrive quietly, one workflow at a time, until one day you look back and don’t recognize how you used to work.
I think we’re in one of those moments.
From “autocomplete my if/else” to “take over the office”
Android Studio announced native support for Gemma 4 running locally. What used to be autocomplete is now an agent that reads your Jira ticket, opens the branch, writes the tests, and submits the PR. All it’s missing is ordering the coffee.
When I saw it working for the first time, my honest reaction was a mix of genuine awe and something close to vertigo. Awe because technically it’s impressive — the agent doesn’t just write code, it understands context, interprets intent, anticipates edge cases. Vertigo because I asked myself: what’s left for us in that flow?
And here’s my personal problem with all of this. If all we’re doing is hitting the “Approve” button, in two years nobody will know why the system works. Or worse, why it broke. Because the knowledge isn’t in the code — it’s in the decisions made to write it. In why this architecture was chosen over that one. In why that module is separated. In the context that lives inside the head of whoever built it.
AI generates code. It doesn’t generate judgment. And when the system fails at 2 a.m. — and it will fail — someone is going to have to understand what’s happening. That someone can’t be another agent.
Meta and the model nobody saw coming
Zuckerberg launched Muse Spark this week and broke from the pattern Meta had been following. Unlike the Llama family, this model is closed — at least for now. Which already says something about where the company’s strategic bets are heading.
But what caught my attention wasn’t the announcement itself — it was the technical detail behind it. Muse Spark is natively multimodal with real-time reasoning capability, built specifically to “see” through cameras and interact fluidly with social networks. This isn’t a general-purpose model that was adapted. It was built from the ground up for that use case.
For those of us building products that touch the physical world — apps with computer vision, hardware integrations, experiences that blend the digital with the real — this matters. Today it might look like another launch in a year full of launches. But the infrastructure Meta has to distribute this, combined with the scale of its platforms, makes it very hard to ignore in twelve months.
It’s on my radar. I’d recommend putting it on yours too.
The technical debt nobody is measuring
This is what worries me most, and I’m not saying it as a theoretical exercise — I see it in real projects, week after week.
There’s an installed narrative that says AI generates cleaner code, more consistent, less prone to human error. And that’s partly true. But there’s another side that doesn’t get talked about as much: AI also generates code at a speed no team can audit in real time. And when the pressure is to ship, when the client wants the deploy yesterday, the temptation to approve without fully understanding is enormous.
The result is a new kind of technical debt, different from before. Before, you accumulated debt because you moved fast and took conscious shortcuts. Now you can accumulate debt without even knowing you’re doing it — because the code looks clean, the tests pass, and everything seems fine until it isn’t.
I know because it happened to me, with my own code, before AI was even part of the equation. Imagine the scale of the problem when the generator works ten times faster than any developer.
The question I ask myself — and that I’d invite you to ask yourself — is simple: are we measuring success by delivery speed or by maintainability? Because those are two very different metrics, and optimizing only for the first is a decision you pay for eventually. Always later than you expect, but always.
Things I’m keeping an eye on
Anthropic blocked a launch. They detected thousands of external vulnerabilities in a new model and decided not to release it. The news passed relatively unnoticed in the middle of the week’s noise, but I think it’s one of the most important decisions I’ve seen an AI lab make in a long time. In an industry where the pressure to be first is constant, choosing not to launch is a signal worth noticing. “More powerful” doesn’t always mean “ready for production.”
Gemini 2.5 Pro is still my go-to for heavy reasoning. I tested several alternatives this month and for complex algorithms, deep logical analysis, and cases where I need the model to actually think before responding — I still haven’t found anything that beats it for that specific use case. For other things there are better options, but for that, it’s still my first choice.
So where are you at with all of this? Is AI already making architecture decisions in your projects, or are you still keeping full control of the process? It’s not a rhetorical question — I’m genuinely curious how teams of different sizes are handling this. See you in the comments.
Leave a Reply