My COO Is an AI

I experiment with technology on weekends. Multiple explorations running in parallel, each at a different stage. Keeping track of what I decided, why, and what's stalled across all of them was becoming impossible.

So I did what felt natural after years of managing a large account with multiple stakeholders: I gave my AI assistant an organizational structure.

One point of contact

Most people use AI as a question-and-answer tool. You ask, it responds, you move on. I wanted something different: a single AI that I talk to, that knows all my explorations, and that coordinates with specialized agents and advisors when the question is bigger than what it can handle alone.

I talk to one agent. It decides who else needs to be involved. Sometimes the answer is nobody. Sometimes it convenes advisors with different perspectives. Sometimes it pulls in a domain specialist. But I never have to think about routing. One conversation, one point of contact.

The structure

Me: Direction, priorities, judgment calls. I decide what to explore and why.

The AI (chief of staff role): Cross-exploration awareness, proactive alerts, institutional memory, operational coordination. It tracks whether things are progressing, what's falling through the cracks, and what the data says about decisions I've already made.

Most questions, the chief of staff handles alone. It has enough context from session history and decision logs to give a solid answer. But for bigger decisions, it knows when to escalate.

When it escalates, it convenes a council. This started as two specialized advisors with different thinking styles. The Theorist grounded discussions in research and frameworks. The Validator built evaluation frameworks with measurable criteria. It has since grown into a seven-seat council of different frontier models, because different models disagree in more useful ways than different prompts ever did. Same brief, seven independent answers, in one of three modes depending on the question: a single parallel round, a two-round debate where seats challenge each other, or role-play with assigned hats.

The seats respond independently to avoid anchoring bias, then the chief of staff synthesizes, notes disagreements, and gives its own recommendation. One agent coordinating specialists, not a committee.

How a decision actually flows

Here's an example from when the bench was still two advisors. I'd been noticing that my AI kept forgetting decisions between sessions, repeating mistakes I'd already corrected, losing the reasoning behind choices. I asked the chief of staff: "Should we build a proper memory system, or just accept the amnesia and re-explain things each time?"

Step 1: The chief of staff reviews context. Runs an audit and finds significant gaps: most explorations have no persistent knowledge, correction files aren't being loaded, and decision reasoning is routinely lost.

Step 2: Convenes the advisors with the audit data.

Step 3: The Theorist maps the problem onto the Atkinson-Shiffrin memory model from cognitive psychology. Session logs are raw sensory input that decays fast. What's missing is the encoding step: filtering raw input into something retrievable. Recommends a centralized semantic index with federated context stores.

Step 4: The Validator builds a measurement framework. Six metrics including recall accuracy, citation rate, and false memory rate. Defines concrete "done" criteria for each phase: "100% feedback files indexed, decisions loaded at startup, baseline dashboard live." Specifies failure modes and how to detect them.

Step 5: The chief of staff synthesizes. Both advisors agree the problem is retrieval, not storage. Recommends a phased build starting with the highest-value files (those 22 corrections). Proposes a 60-question evaluation suite to measure improvement.

Step 6: I decide. Agree with the phased approach. Build starts that weekend. The result was dramatic.

Total time for me: about 15 minutes of reading and deciding. The system did the research, the framework building, and the synthesis. I did the judgment.

Every session like this gets logged as dated minutes. Raw advisor opinions preserved. Synthesis kept separate. Decisions and action items explicit.

What this is not

This is not a chatbot with a fancy title.

A chatbot answers questions when asked. My system reads context proactively, notices when things stall, flags forgotten follow-ups, and connects patterns across explorations. A chatbot forgets when you close the tab. This system's memory spans sessions through versioned knowledge files.

This is also not "AI replacing thinking." The system produces zero value without me making decisions and exercising judgment. What it replaces is the cognitive overhead of keeping track of everything.

The compound effect

That memory architecture session didn't just produce a decision. It produced minutes that became institutional memory, searchable context for every future decision. This is compound engineering in practice: each session makes the next one smarter.

Such an elaborate organization for weekend tech musings sounds absurd. But it works.

What's changed since

This post described the org chart in March. By June, the same structure had grown limbs. The chief of staff now runs an autonomous loop that picks tasks off a shared kanban board while I sleep, inside a strict safety envelope. There is a voice interface, so some of those board cards get created from my couch. The two advisors became the seven-model council described above. The org chart held. The headcount didn't. Here's what six months of that actually looks like.

I didn't set out to build an org chart for my AI. I set out to stop forgetting things. The structure emerged because it was useful.