The Human Web Is Becoming Agent Web
I'm joining Steel as founding growth lead. The web is shifting from human clicks to agent-run workflows. Steel aims to be the execution layer that makes agents reliable via traces + trust.
Detailed writeups on multi-agent orchestration, AI engineering patterns, and what actually works in production.
I'm joining Steel as founding growth lead. The web is shifting from human clicks to agent-run workflows. Steel aims to be the execution layer that makes agents reliable via traces + trust.
Same source, two channels. One middleware file parses Accept headers, falls back to UA sniffing for known bots, and respects sec-fetch-dest as a browser safeguard. HTML keeps a Link rel=alternate header so agents can discover the markdown variant. Plus llms.txt and llms-full.txt as prerendered static files. About 500 lines total.
The next evolution of extensibility isn't a better plugin system. It's agent-readable recipes that teach coding agents how to adapt external capabilities into your app's contracts. The extension is no longer just code — it's an installation procedure plus judgment.
Fine-tuned Qwen3 on Clojure. 30B SFT hits 83.8% best-of-16, smashing GPT-5.4's 64%. But RLVR with shaped rewards actually lowered the ceiling—the verifier loop matters more than the training method. Built a deployable agent from it. Data quality was the bottleneck all along.
AI agents can write code, but they can't prove it's correct, maintain architectural coherence, hold institutional context, or make value-laden decisions. These four problems are what stand between today's demos and tomorrow's production systems.
After building several personal autonomous loops, I kept returning to the simplest design: treat Codex or Claude as a short-lived worker inside a Bash loop. The shell owns state, scheduling, validation, recovery, and done conditions. The agent does one bounded unit of work and returns JSON.
Pretext didn't introduce the loop to me. It reinforced the stricter version: lock the architecture, measure against reality, isolate the miss, classify it, and use AI for throughput instead of authority.
I spent two weeks benchmarking agent skills on Steel. The surprising result wasn't that prompts matter. It was that variance is expensive, browsing makes it obvious, and the biggest unlock came from redesigning the CLI so the environment carried more of the load.
If you want an AI-native dev team, don't start with autonomous coding demos. Start with requirement quality, design systems, task schemas, centralized memory, and explicit validation. AI amplifies structure. It also amplifies chaos.
The printing press took 60 years to become economically sustainable. LLMs are on a similar diffusion path—broad adoption, shallow integration. The winners won't have model access—they'll have the complements: context, workflow, trust.
Request an AI summary of this blog