From Pilots to Production: California's Operational Playbook for Governed AI

Most California agencies are no longer asking whether to use AI. Employees are already pasting work into chatbots, and vendors are already embedding models into the products agencies buy. The real challenge has shifted: how do you take the experiments already running across your organization and move them into production, safely, defensibly, and at scale?

June 24, 2026

Shelley Ballard

This guide lays out a practical playbook for doing exactly that, drawn from a recent conversation between Darwin AI's Chief AI Officer, Dustin Haisler, and Linxar Senior Managing Director, Bharat Bagaria. Watch the full webinar replay here.

California wrote the rules early. The real test is operationalizing them.

California moved first. Executive Order N-12-23 staked out a position on safe, responsible public-sector AI, directing procurement reforms and introducing risk inventories and ethical guardrails. The statewide GenAI policy and the "Choose Your Own GenAI Journey" framework followed, and the legislative pipeline hasn't slowed since, roughly 30 AI-related bills have crossed into a second chamber, reaching from state operations down to local government. Add a federal government signaling it wants a say in which models even reach the public, and agencies face a patchwork: overlapping rules, moving deadlines, and genuine uncertainty about whose law ultimately governs.

It's tempting to treat that uncertainty as a reason to wait for the courts. It isn't. The rules you already operate under, acceptable use, data security, open records, apply to AI right now, which means inaction carries its own liability. The agencies pulling ahead aren't the ones with the cleverest new policy. They're the ones building on the foundation they already have, in a way that can flex as the rules change.

Why promising pilots stall on the way to production

A pilot that works in a demo and a system that works in production are two very different things. Three gaps account for most of the failures in between.

Visibility. Pilots tend to sprout in pockets; different vendors, different departments, different corners of IT, until no one can say what's actually running, what's working, or what's quietly failing. You can't manage, measure, or govern what you can't see, so visibility is the prerequisite for everything else.
Procurement. California already attaches a GenAI supplement to its contracts, but teams often sign it without fully grasping what it commits them to, and some hesitate to disclose GenAI use even when it's happening. Expectations and deliverables drift apart, and pilots stall in the gap.
Data readiness. This is the one that surprises people most. A pilot runs beautifully on a small, curated dataset, then breaks the moment real production data arrives; records that are old, incomplete, or scattered across systems that don't talk to each other. Garbage in, garbage out, and hallucinations stop being theoretical. More than any other factor, unready data is what keeps a successful pilot from becoming a production system.

What governance looks like once it leaves the PDF

For a lot of teams, "governance" means a document everyone signs and no one reads. Operational governance is different, and it starts with something concrete: an honest, real-time picture of where AI actually lives in your environment, not what a survey claims, but what employees and vendors are genuinely doing. As Dustin put it, "You can't govern what you can't see."

From there, the goal is guardrails that work at the level of everyday behavior. His analogy is a useful test: deploying AI today is "like giving every government employee a McLaren F1, and putting them on an open road with no speed limit signs." Good governance doesn't confiscate the car, it sets boundaries clear enough that no one has to look them up in a manual.

In practice, that means intercepting risk in the moment rather than after the fact. Picture an employee pasting a constituent's email into a free chatbot to draft a faster reply: effective tooling redacts the PII before it leaves, flags the unsanctioned tool for IT, and coaches the employee without shaming them or shoving the behavior onto a personal device, which is exactly where over-restriction sends it. And when you do roll out policy, lead with the why; a rule that arrives with no context reads like punishment and kills adoption.

There's no single right operating model. Agencies succeed with embedded governance, a center of excellence, a dedicated chief AI officer, or, increasingly at the state level, a hybrid of all three. The right structure is adaptive, shaped to your culture rather than stamped from a template.

A real use case, from intake to production

Social-services eligibility is a useful example because it maps cleanly onto the playbook. Today a caseworker hops across several aging systems to verify citizenship, income, and household details, run fraud checks, and reach a decision, work that can stretch to days per case.

The path to production follows a repeatable arc:

Intake—name the use case, the sponsor, and the business outcome (here, clearing the eligibility backlog).
Data—pull the sources together and assess the risk in disconnected legacy systems.
Procurement—nobody writes models from scratch anymore, so find a vendor with the right one.
Pilot—run a small, representative set of cases and check accuracy.
Production—roll out crawl-walk-run, with a human still in the loop.

The payoff in one comparable deployment: average processing dropped from about four hours to three; roughly 20% saved per case, which across 200,000 cases produced a three-to-four-times return. Two principles make the difference. Don't pursue AI for its own sake; anchor every use case to a real problem you can measure before and after. And give each effort one clear owner, govern by committee and too many hands turn into groupthink, and nothing ships.

The hidden 80%: The costs agencies forget to budget for

The most common budgeting mistake is assuming the software is the cost. It isn't; licensing typically runs just 10 to 15% of the total. The real money goes to data cleanup and migration (30–40%, and some California efforts spent two years getting data ready), plus the transformation, governance, and IV&V scaffolding (another 20–25%). Then there are the line items teams routinely overlook: change management, which has to begin before the first proof of concept, and ongoing model monitoring, which is heavier than anything in a traditional application stack. Budget only for the visible 20% and you'll be writing a budget change proposal before the project ships.

Start small and start now, especially if you're a small agency

A small agency without a dedicated AI team is often better positioned than a large one, because it can move. You don't need a chief AI officer; you need a clear owner, a defined process, and a few guardrails. Get resourceful, partner with a local college looking for real-world test cases, or band together with a neighboring city or county to share the load.

Wherever you sit, the first moves are the same: see the AI already in your environment, apply the rules you already have, and bring your people along with honest change management. Inventory what's happening, then pick one use case and run it well.

Darwin AI and Linxar are partnering to help California agencies turn governance from a PDF into operational reality. To go deeper, watch the full webinar replay, then reach out for the follow-up resources from the session—a use case prioritization scorecard, a cost and ROI model, and a first 100-day plan.

Product Updates

Darwin's AI Tool Explorer: Govern AI Tools Before They Reach Your Agency

Darwin AI

Most AI governance happens too late, after a tool is already in use and the risk is already live. Darwin's AI Tool Explorer flips that, letting security and IT admins evaluate and classify AI tools before they ever reach the environment. Here's how it closes the gap between first use and real oversight.

General

AI Is Coming for Government Backlogs

Noam Maital

The conversation about AI and work has mostly focused on replacement. Which jobs will disappear? Which workers will be displaced? Which companies will use AI to cut headcount? That conversation is no longer theoretical. It is showing up in how companies hire, reorganize, and decide what work still requires people.

Georgia Lays the Foundation for Responsible AI in State Government with Darwin AI Partnership