Monday Momentum
Posts
The New Developer Playbook

The New Developer Playbook

What OpenAI, Anthropic, and Cursor all agree on and what it means for the future of software development

Justin Wright
December 01, 2025 • Est. Reading Time: 9 minutes

Happy Monday!

Three companies published AI coding guides this year. Reading them together reveals something nobody's saying out loud: the job description for "software engineer" just changed.

OpenAI released a comprehensive guide on building AI-native engineering teams. Anthropic published Claude Code best practices based on internal usage. Cursor's documentation and community have crystallized workflows that thousands of developers use daily. These aren't competing visions, they're actually all converging on the same playbook.

The pattern is unmistakable: specification-first, agent-as-implementer, human-as-reviewer. Engineers don't write code anymore. They direct agents who write code.

And the data backs it up: 41% of all code is now AI-generated. 84% of developers use AI tools daily. 25% of YC's Winter 2025 batch has codebases that are 95% AI-generated. This is the new normal.

OpenAI, Anthropic, and Cursor all published AI coding guides in 2025, and they converge on the same workflow. The pattern: explore → plan → implement → review. Agents handle first-pass implementation; humans handle direction and quality. All three emphasize persistent context, test-driven iteration where agents run until tests pass, and "thinking" triggers for complex problems. The data supports the shift: 41% of code is AI-generated, models can sustain 2+ hours of continuous work (doubling every 7 months), and developers using agents report 70% reduced time on tasks. The new skill isn't coding, it's specification and direction. Teams that nail this workflow ship faster.

TL;DR

The Convergent Workflow

All three guides describe variations of the same loop:

OpenAI's version: Plan → Design → Build → Test → Review → Document → Deploy. Agents contribute to every phase, with humans owning strategic decisions, architecture, and final approval.

Anthropic's version: Explore → Plan → Code → Commit. They're explicit: "Steps 1-2 are crucial—without them, Claude tends to jump straight to coding. Asking Claude to research and plan first significantly improves performance."

Cursor's version: Agent mode for implementation, Ask mode for planning. Use rules files to encode team conventions. Let the agent iterate until tests pass.

The shared insight? Slow down to speed up. Every guide emphasizes planning before implementation. Every guide treats specification as the critical skill. Every guide assumes the agent will do multiple implementation passes while the human reviews.

This inverts the traditional ratio. Engineers used to spend 80% of time writing code and 20% reviewing. Now it's flipping: 20% directing, 80% reviewing and refining.

The New Infrastructure: Memory Files

All three platforms converged on the same solution for persistent context: markdown files that agents read automatically.

OpenAI calls them AGENTS.md. Anthropic calls them CLAUDE.md. Cursor uses .cursorrules. Same concept, different names.

These files contain:

Bash commands and build instructions
Code style guidelines
Testing requirements
Repository conventions
Project-specific context the agent needs

Anthropic's advice: "We recommend keeping them concise and human-readable... We occasionally run CLAUDE.md files through the prompt improver and tune instructions (adding emphasis with 'IMPORTANT' or 'YOU MUST') to improve adherence."

This is prompt engineering for codebases. The teams that invest in their memory files get better agent output. The teams that skip this step wonder why their agents keep making the same mistakes.

OpenAI takes it further: connect agents to issue-tracking systems, have them read feature specs, cross-reference against the codebase, and flag ambiguities before implementation starts. The agent becomes the first-pass analyst, not just the first-pass coder.

Test-Driven Everything

Here's where all three guides get specific: test-driven development isn't optional anymore. It's the control mechanism that makes agent-driven coding reliable.

Anthropic's workflow: "Ask Claude to write tests based on expected input/output pairs. Tell Claude to run the tests and confirm they fail. Ask Claude to commit the tests. Ask Claude to write code that passes the tests, instructing it not to modify the tests. Tell Claude to keep going until all tests pass."

OpenAI's framing is that agents perform best when they have a clear target like visual mocks or test cases.

This isn't Test-driven Development (TDD) for philosophical reasons. It's TDD because agents need a verifiable goal. Without tests, you're debugging vibes. With tests, you're debugging failures (which agents are actually good at).

The pattern extends beyond unit tests. Cursor users report success with "YOLO mode" (letting the agent run commands automatically) but only when paired with robust test suites that catch errors before they compound.

The Thinking Hierarchy

Anthropic revealed something specific about triggering deeper reasoning: the word "think" activates extended thinking mode. And it scales.

"These specific phrases are mapped directly to increasing levels of thinking budget: 'think' < 'think hard' < 'think harder' < 'ultrathink.'"

OpenAI's guide mentions similar dynamics: complex tasks benefit from asking the agent to plan before implementing, to verify reasonableness as it goes, and to check its own work against requirements.

Cursor's community discovered the same thing empirically: Agent mode for execution, Ask mode for planning. Switching between them deliberately produces better results than staying in one mode.

The meta-lesson: agents have gears. Learning when to shift matters.

What This Means If You're Building

The playbook is now public. The question is whether you'll follow it.

For individual developers:

Invest in your AGENTS.md / CLAUDE.md / .cursorrules files. This is leverage.
Use "explore, plan, code, commit" instead of "code, debug, code, debug."
Write tests first. Let the agent iterate against them.
Learn the thinking triggers. "Think hard" is enhanced compute allocation.

For engineering leaders:

Your developers' job descriptions just changed. Update your hiring criteria accordingly.
Specification quality is now a first-order concern. Vague tickets produce vague code.
Review cycles matter more than implementation speed. The agent implements fast; the bottleneck is human judgment.
Standardize your memory files. Every team reinventing this wastes cycles.

One way I have put this into practice in my current coding workflows is replicable at scale for individuals or teams. I use a combination of Claude Code and Cursor to build most software. I have very clear specialized agents in Claude Code (you can use /agents to build them) and have a Senior Product Manager and Senior Systems Architect agent which both leverage Opus (Claude’s beefiest model) to do the planning, refinement, and system design. I then have other agents running Sonnet do the actual front-end and back-end development. From there, the Systems Architect analyzes the code created to ensure that there are no major errors, and the Product Manager checks to make sure the product matches original requirements at each step.

The first step in development once the plan is complete is to build extensive tests which the agents can’t adjust. If the tests fail, the code must be fixed. This iterative, multi-agent approach ensures quality at each step. I then manually review and test functionality as the app or product is built, using human review as a final check at every step of the process. This type of workflow has increased the quality of AI-generated code in my workflows by a substantial degree.

The Bottom Line

The three major AI coding platforms converged on the same workflow because it works. Explore before implementing. Specify before coding. Test before shipping. Review everything.

This isn't saying that AI will replace developers. AI has just changed what developers do. The new job is specification, direction, and quality control. The old job, translating specs into syntax, is increasingly handled by agents.

The guides are public. The playbook is documented. The only question is whether your team adopts it before your competitors do.

Models can now sustain 2+ hours of continuous work, with task length doubling every 7 months. At that trajectory, an "entire feature implemented from spec" isn't a demo; it's just another Tuesday. The teams that learned the new workflow early will compound that advantage.

The teams still debugging vibe-coded spaghetti will wonder what happened.

In motion,
Justin Wright

If the three major AI coding platforms all converged on "specification-first, agent-as-implementer, human-as-reviewer," and if test-driven development is the control mechanism that makes it work, are we witnessing the emergence of a new engineering discipline where the primary skill is clarity of thought rather than fluency in syntax?

Food for Thought

Building an AI-Native Engineering Team - OpenAI
Claude Code Best Practices - Anthropic
Cursor Features and Documentation - Cursor
2025 Stack Overflow Developer Survey - AI Section - Stack Overflow
The State of Developer Ecosystem 2025 - JetBrains
Measuring the Impact of Early-2025 AI on Developer Productivity - METR
From Vibe Coding to Context Engineering - MIT Technology Review
Vibe Coding - Wikipedia - Wikipedia

I am excited to officially announce the launch of my podcast Mostly Humans: An AI and business podcast for everyone!

Episodes can be found below - please like, subscribe, and comment!

Spotify: https://creators.spotify.com/pod/profile/mostly-humans-podcast/
Apple: https://podcasts.apple.com/us/podcast/mostly-humans/id1831319729
YouTube: https://www.youtube.com/@Mostly_Humans_Podcast