Progress on Automatic Code Generation

What's inspiring me and what's happening with AI-powered code generation in Blawx

Feb. 14, 2026

I was under the weather last weekend, so progress has not been as fast as I would like, but we're back at it. Happy Valentine's Day!

I'm currently working on improving how Blawx uses AI to generate suggested encodings of legislative text. I'm VERY happy to be working on this problem, because everything I've done for the last year or so on Blawx has been in anticipation of being able to come back to this problem.

And the world of software development has shifted dramatically in the meantime. Two things in particular are inspiring me in thinking about how to solve the automatic code generation problem. Perhaps unsurprisingly, they both have to do with multi-agent coding systems.

PolicyEngine Experiments

The PolicyEngine guys have been doing some fantastically forward-thinking work in this space specifically to do with Rules as Code. About 90% of their strategy makes perfect sense to me. The parts that don't are of the "if that helps, I don't know why" variety, which I strongly suspect means I'm going to learn something. Gotta love it.

OpenAI's "Harness Engineering"

OpenAI blogged this last week about an experiment in which they gave themselves the challenge of generating a software product where humans do not write or read any of the code. Making that work requires a bunch of approaches that they refer to as "harness engineering."

The ambition of this idea is genuinely astonishing, and that it worked is another degree of astonishing on top of that.

Angst

A lot of people in the software development business in the last year have been talking about how dramatically AI is changing software development. That has caused some existential angst in some places, particularly with people whose sense of identify and value was tied up with their ability to write good code. Before now, I haven't had that sort of emotional reaction to most developments in the AI software development space. I don't particularly like writing code as a way to spend my time. I like solving puzzles, but a lot of the complexity in software development is not intrinsic, and I'm happy to be rid of it. I like having things to show people, and writing code is just how you get those things.

"Harness Engineering" has been different, for me.

And not in the sense that I can see AI coming for my job, though I suppose I can. It's something else.

Harness Engineering is potentially recursive. We could use harness engineering to build code systems that are better at generating harnesses for other tasks. A given task can be broken down into smaller parts, which each part given its own harness framework. So the threshold of feasibility for what you can get agents to do in software development (excluding tasks, for the time being, that require human empathy and non-written communication skills) has not merely moved further away. It is more like it has disappeared entirely.

Harness Engineering is not fundamentally restricted to software development. Software development is just the first human subject matter expertise to which it has been applied. And of course it has, because that's how the AI giants are hoping to accelerate their own businesses, and in terms of susceptibility to AI disruption, software development is low hanging fruit. Lots of good data, strict formalisms that allow for all kinds of automated guardrails, existing tools for collaboration between humans in using those formalisms that expose an interface the AI can already use.

But those are properties are not exclusively true of software development. They are most true of software development. But they can be made true of other types of subject matter expertise, too. Expanding this approach to things other than software development would require building systems that are as good as subject matter experts at new, more formalized versions of those tasks. Which until now has seemed more or less logistically impossible. But if you can use harness engineering to generate a software product that does that novel formalization task as well as a small handful of subject matter experts can, then you have the equivalent of an infinite supply of subject matter experts for that task, too.

I can't quite see where the boundaries of that idea are, anymore. And not being able to see the boundaries is what is giving me a strong emotional reaction. Until now I have been able to see the boundaries of what was possible with current and near-term approaches. I can't see it anymore.

More to the Point

Working on the new version of AI-powered code generation in Blawx is going to be a mutli-step process. First, I'm going to build just enough infrastructure in the system to start exploring what coding agents are good and bad at when exposed to Blawx's data and invited to add to it and modify it. Second, I'm going to implement some of the low-hanging fruit sorts of "harnesses" that will help improve performance, and validate that it is doing things it wasn't able to do before, and that it is better than it used to be.

I'm making good progress there. My current guess is we should get that far by the end of the month, and I should be able to show off some AI-generated Blawx encodings that far exceed the complexity and quality of anything that we have seen before.

Then there will be the perhaps bigger question, which is how to expose those capabilities to the human Blawx users in the app. I can do experiments inside an IDE, but that's not where Blawx users are supposed to live. And interacting with an agent working in your codebase is probably just a more specific version of the problem of interacting with anyone else working in your codebase. So the right way to expose agent behaviour inside Blawx is probably however you would expose the behaviour of any other collaborator.

Software development has version management, Blawx doesn't, yet. Giving people access to what agents can do in Blawx might require fixing that. But also, maybe not right away.

We'll see.