ARCHITECTURE·MAY 16, 2026·12 MIN

INSIDE THE AGENTIC ROOFING OS

What runs at 3 a.m. when the office is empty: a tour of the agent loop, the tool graph, and the human-in-the-loop boundary.

BY Elliot Mizrahi·Principal Engineer, Agent Platform

Most marketing copy about "AI for roofing" treats the system as a black box. This article opens it. If you're going to run your operation on agents, you need to understand what the agents actually do, which tools they have access to, and where humans are still in the loop.

The agent loop, in detail

An agent in ROOF_OS is not a single LLM call. It is a loop:

Pick up an event (new lead, scheduled job approaching, payment overdue, weather alert).

Pull context (the lead's history, the job's status, comparable past decisions, current pricing data).

Decide the next action from a bounded action space.

Either take the action (within authority) or escalate to a human with a structured ask.

Write the action and the reasoning into the audit log.

Step 2 is where most agentic systems quietly fail. If the context fetch pulls back a partial picture — say, the lead's interaction history but not their insurance claim status — the agent will confidently make the wrong decision. We spend more engineering time on context retrieval than on model selection.

The tool graph

Each agent has a tool set. "Tool" here means a callable function or external API endpoint. The Quoting Agent's tool set looks like this: aerial measurement (single provider primary, one fallback), material price lookup (3 supplier APIs), labor productivity lookup (internal), comp pricing (internal historical data), CRM write, email send.

Multiply this across all agents and you get the integration graph behind ROOF_OS — 2,000+ tools across roughly two dozen categories. The shape of the graph matters more than the size: an agent should reach for the cheapest, fastest tool that answers its question, and fall back gracefully when a tool fails.

Reliability is a property of the graph, not of any single tool. If your aerial provider has a 99.5% uptime — solid by industry standards — and you have no fallback, your Quoting Agent has a 99.5% uptime. With one fallback you're at 99.99%. Tool redundancy is operational hygiene.

PART OF

THE AUTONOMOUS ROOFING OPERATOR PLAYBOOK

A field-tested playbook for running a roofing company where agents quote, schedule, dispatch, and follow up — while owners decide.

Frequently Asked Questions

Which model powers ROOF_OS agents?

It is model-pluggable. Different agents run on different models depending on the task — short-context routing tasks on smaller models, long-context drafting tasks on larger ones. We benchmark and rotate as new models ship.

What happens during a model outage?

Each agent has a fallback model and a degraded-mode policy. If both are unavailable, agents pause new action and the escalation queue absorbs everything. We have never had a full outage longer than 12 minutes.

How do I audit a specific decision the agent made?

Every action has a decision log entry with the inputs, the reasoning trace, and the resulting tool calls. Search by job, agent, or timestamp.

Can I write my own custom agents?

Yes, via the agent platform interface. Most operators do not. The four built-in agents cover ~95% of the work, and the customization that does happen is in policies and prompts, not in net-new agents.

Is my data used to train models?

No. Customer data is isolated to your tenant. We do not train on customer data, and the underlying model providers we use are configured with zero data retention for training.