Locally great, globally drifting
One of my colleagues at Spatie, none other than the infamous Freek, a strong back-end engineer (I'm not saying this to butter him up) with less front-end experience (sorry), wanted to know how far AI could be pushed to write the entire front-end of a real product. A bold experiment or a reckless gamble? Doesn't matter! "There There" is a customer-support platform with tickets, channels, AI agents, an embeddable chat widget that ships to customers' own sites and so much more.
"Could you take a look at the front-end and evaluate?" he asked. "Is this actually production-worthy?"
So for two days, I did just that. TL;DR: the code is surprisingly good locally, and increasingly inconsistent globally. That's an interesting result, since the product itself works really well. We're using it internally as I write this. So, I wanted to talk about how I handled this code-quality check and how I used Claude to give me a hand.
AI doesn't get to review
Before I opened a single file, I decided on one thing: I could have asked AI to review the codebase, published the output, and casually sat back for two days to watch some Netflix. I obviously didn't do that. That would have produced a plausible review. It would not have produced a real one. And, I would have felt really guilty watching Netflix ...
A real review decides "this is a real bug", "this is a preference call", "this is fine", "this is absolutely not fine". AI is excellent at making lists. It's not great at owning an opinion.
Don't start with AI. I wanted to build my own mental model first. That meant understanding domains, boundaries, folder structure. Reading some code. Tracing a flow or two. Without that baseline, I'd just be stamping each prompt with a "yes" that came back.
When AI comes in, use it mechanically. Like grep on steroids: count, list, find; things with answers I can eyeball. And as a candidate generator: flag anything that might be an issue based on my standards, I'll decide which actually are.
So the first few hours were just me, the There There folder tree, and a handful of files. No AI yet. Just trying to work out what domains "tickets", "channels", "brains", "recaps" actually were, how they related to each other and where the boundaries sat. I kept asking myself "Where would I put things if I were writing this myself?" I needed my own frame of reference before anything could push me toward a conclusion.
Then the moment arrived. I brought AI in. As a counter, and as a generator. For the generator, I wrote a long prompt asking it to find bugs, React anti-patterns, performance issues, security concerns, missing pieces, inconsistencies. Anything it could justify. It produced a ~600-line catalogue with severity tags and line numbers. More than I could have surfaced by hand. It was really impressive.

Then the real work started. Going through that catalogue and validating every item. To be honest, this was a bit boring, but it helped me understand the codebase even more. Some were real bugs. Some were real trade-offs. Some were style preferences wearing a severity tag. Some were the AI spotting one outlier and writing it up as a pattern. That validation is the review. Everything else is just lists. Spoiler alert: I didn't get to the end of the list yet. It was quite a lot of code and severity checks to validate in two days. I couldn't come up with a solution for every issue in time because of how the code was entangled between domains. The conclusion was already taking shape: globally, there were inconsistencies.
Pass 1: architecture
I had to zoom out first. Look at folder structure, import direction and where domains lived.
The structure consists of a couple of main folders: common/, modules/, pages/, with a separate widget/ for the customer-facing chat widget that ships to external sites. UI primitives come from shadcn/ui. Shared higher-level things sit under common/. Per-feature code lives under modules/tickets/, modules/chat-agent/ and a few more. Pages wire it all together. Honestly, it's a layout I'd probably start with myself if someone handed me a blank repo. I didn't need AI to help me understand this structure. But I did use it to double-check I hadn't missed anything obvious. No surprises so far.
Then I opened a few files in common/ and paused. AiSetupPrompt.tsx had exactly one consumer, inside workspace settings. ChannelBadgeList.tsx, the same. An AI-chat editor tree under common/components/ai-chat/ was used only by modules/agent-chat/. These aren't common. They're feature-specific components that landed in common/ because no one told the AI not to put them there.
This is where AI earned its keep. "For each file under common/components, list every file that imports it." A few minutes later I had the list, and the pattern was obvious. A handful of files that actually belong in a module if we'd had a rule about where things go. I recognized the shape because I've done it myself on other codebases. "Temporary" in common/ becomes permanent faster than anyone expects.
For every file under common/, list every file outside common/ that imports it. Give me the output as: common/path/File.tsx → [list of importers]. Sort by importer count ascending. Flag any file with 0 or 1 importers. Those are candidates for "this isn't actually common, it's feature code that landed here."
Verify import direction. Find any import where:
- common/ imports from modules/ or pages/
- modules/ imports from pages/
- modules/X imports from modules/Y (list both directions found)
- widget/ imports from anywhere outside widget/ and common/
List every directory under pages/ with more than 20 files. For each, give me: file count, and a one-line guess at whether the contents are (1) thin page orchestration, (2) domain logic that should live in modules/, or (3) mixed. Base the guess on filenames and top-level exports
The other direction was worse. pages/settings/workspace/ has 75 files. Oh boy. After opening a few, it was clear that this wasn't settings UI. It was domain logic. Channel-setup wizards with DNS verification. Workflow editors with eight type-specific sub-editors. Billing with plan comparisons and Stripe integration. AI brains with MCP connection management. An entire product's worth of domains, never pulled out into modules. They lived inside a page folder because a page folder was where the work had started, and nothing ever moved.
Validating the import direction itself was actually clean. common didn't reach into modules or pages. modules didn't reach into pages. The widget was fully isolated. That's good. But none of it was written down anywhere, and that's the point. It was a convention that just so happened to hold ... for now. A future AI session that doesn't know the convention will break it the first time it's convenient. Rules like these need to be written down. If they had been, a lot of the global drift could have been avoided, or at least kept to a manageable amount.
Pass 2: components and hooks
After looking at the bigger picture, I zoomed in. I opened pages/tickets/Index.tsx and read it mostly end-to-end. Over a thousand lines. By the time I hit the bottom I had a clear feel: this isn't a page, it's an application living inside a page. Then I asked AI for numbers. It's really good at those: finding patterns and deviations against my standards.
For pages/tickets/Index.tsx, give me:
- useState count
- useRef count
- useEffect count, and for each useEffect: dependency array, and a one line read on whether it looks like a genuine side effect, derived state that should be useMemo or just an inline expression, "sync with prop" that's papering over a parent issue
- useCallback / useMemo count, and how many are passed to memoized children vs. just defensively wrapped
- Total prop count for most-propped components rendered in the file
- Any context value object literal, count its fields
Just give me the numbers. I'll decide what's bad.
Count props for these components specifically: Composer, CommandPalette, AgentChatContext (the context value, not the provider). For each give me total props, how many are onX callbacks, how many are booleans, how many are domain objects.
The answers came back: 29 useState/useRef calls in that one file. 14 useCallback/useEffect pairs. A Composer with 17 props, 8 of them onX callbacks. A CommandPalette with 14 props, 11 of them commands. An AgentChatContext with 22 fields in its context value. I let AI count them twice, just to be extra sure and sadly, they didn't get any smaller. These aren't numbers I'd compute by hand, but they turned a vague "this feels too big" at first glance into a hard signal. God-components pretending to be components. Every new command or outcome widens the interface. "Death by a thousand callbacks". AI helped me trim those down and base my conclusions on facts instead of a feeling.
Here's where the AI-generated catalogue started paying off more directly. A few findings I'd have spent days arriving at on my own. I still had to validate them, but I was impressed by what it surfaced once it knew what to look for:
useDraftManagementstomps user input. The restore effect depends on[ticketUlid, mode, replyTemplate].replyTemplateis a plain object prop from the page. Any re-render that produces a new object identity re-runs the effect, re-reads localStorage, and overwrites whatever the user is typing. In production, this silently eats in-progress draft work. Real bug.useChannelregisters stale callbacks. Handlers are kept in a ref, but the ref is only iterated once, at subscription time. Echo keeps whatever callback it was handed first. Subsequent renders update the ref and nothing notices. Real bug.AgentChatHistory's group memo never recomputes.useMemo(() => groupChats(filteredChats, currentTicketRef.current?.id ?? null), [filteredChats, currentTicketRef]).currentTicketRefis aMutableRefObject, so its identity is stable for the entire lifetime of the component. The memo is pinned to first-render forever. "Related to this ticket" shows the ticket that was active at mount. Silent bug.forwardRefin two files, plain-prop ref everywhere else.MentionList.tsxandPlaceholderSuggestion.tsxstill useforwardRef<Handle, Props>. Everything else usesref?: Ref<T>as a plain prop, which is how React 19 wants you to write it. Two files is a coin-flip waiting to happen the next time the AI touches one of them.
And one where I overruled the catalogue.
It flagged a useState<string | undefined>(undefined) at TicketViewsSidebar.tsx as an "inconsistency with the codebase-wide useState<T | null>(null) convention". Technically true: 30 uses of null, one of undefined. But one case against thirty isn't a pattern, it's an outlier. A human glances at both and reacts differently. The AI fires the same flag for either, because all it sees is "inconsistency present." It has no sense of how much inconsistency there is, or whether the convention is already settled.
Locally good, globally drifting
By the end of two passes, a clearer pattern had shown up.
Individually, most files in this codebase are good. Consistent Tailwind via cn(), not a single template-literal class interpolation anywhere. Dark mode handled through semantic tokens (text-muted-foreground, bg-card), not dark: overrides scattered everywhere. Functional components, not const Foo = (props) => (...). Uniform naming. shadcn/ui as the one source of primitives. displayName set only where it actually matters. Open a random file in isolation and you'd mostly nod. To be honest: AI writes a single file well. It's just that sometimes, the files after might break that streak.
The drift shows up the moment files have to interact.
Three ways to render an empty state. One file uses all three of them at once. Two independent copies of formatRelativeTime with different thresholds and different casing ("Just now" vs. "just now"). The design token text-destructive in most places, hardcoded text-red-500 in others. Form errors through the <FormField> wrapper in ~12 files, and hand-rolled <p className="text-sm text-destructive"> copy-paste in 15+ other places. Status strings like 'open' | 'closed' | 'spam' | 'waiting' inlined 98 times across the tickets module. A matching WorkspaceMemberRole union already lives in the generated types, ready to be imported. Why didn't the AI use it? Because no one told it to.
None of these are "the AI made a mistake". Each choice is reasonable in isolation. The problem is that there's no shared frame of reference across sessions. Every decision gets re-made from scratch, and N decisions later you have N variants. Local consistency is not the same as global consistency.
That's the thing that convinced me this review was worth turning into a document.
The ruleset
What came out of two passes is a small set of rules I'd commit to the repo as a README.md. Not a style guide. A ruleset. The kind of thing a linter can enforce:
The three layers.
common/: used by more than two modules and feature-agnostic in its public API. No domain types in props or return shapes.modules/X/: domain-owned. Components, hooks, helpers, types for feature X.pages/X/: thin orchestrator plus any genuinely page-only layout.
Import direction: common ← modules ← pages. common imports nothing upstream. modules imports only from common. pages imports from both. Between modules, one direction only (lower-level domains like users and teams feed higher-level ones like tickets, never the reverse). We could enforce this via ESLint's no-restricted-imports. Mechanical rules can't be silently broken by a future session.
One hook per file. One component per file. The exceptions are never worth the drift they create.
forwardRef is banned. React 19 treats ref as a plain prop. Use that.
State ownership, by scope.
- Server data goes through Inertia props.
- Shared server truth that updates mid-session goes through
useSharedData. - State that belongs in the URL (selected ticket, filters, search) goes through URL params via a typed hook.
- Local UI state (dialogs, hovers, input) is
useState. - Cross-component state within one feature goes through a feature context that throws when used outside its provider.
In general: state lives as far from the DOM as possible, in this order: URL > shared > context > local.
Form errors go through <FormField>. No hand-rolled <p>.
The list is deliberately short. I don't want to "token out" the AI with too many rules. But with this list in place, most of what I found over two passes would have been caught at write-time. Given a ruleset in its context, AI follows rules pretty consistently. That's the part I find most encouraging about the whole exercise.
What I actually took away
Going in, as soon as I heard that the front-end was entirely generated by AI, I half-expected a disaster. I tried to keep an open mind that it might just be a minor-issues list. I found something more interesting. This codebase is locally more consistent than I had any right to expect. Naming is uniform. cn() is applied consistently. Dark mode covers the surface. Most components are focused, modern, and actually functional.
And globally, with nothing to anchor to, it drifts away from that consistent layer you find locally. A single file gets written in the shape of "whatever felt right this session". A thousand files of "whatever felt right" is how you end up with three empty-state variants and two copies of the same helper. It's like when a developer works long enough on a codebase, he'll eventually develop a taste for what 'right' looks like there. The AI never does that. Each session is a fresh start with no memory of what was decided yesterday. Must be terrible: an eternal first day on the job.
What I want to take away from this review isn't "Don't use AI on the front-end and watch some Netflix". It's more along the lines of "AI is a tool, if you use AI for front-end, write the ruleset first, commit it to the repo, feed it back in every session. And go through it with the AI instead of letting it take the reins".
Without AI, the review would have taken much longer, and I wouldn't have narrowed down the issues as quickly. Using AI for the repeatable, heavy, honestly-quite-boring tasks amplified my progress; it didn't replace it. The review turned into a ruleset, and the ruleset is what will keep the next hundreds of files from repeating the first hundred's drift into a black hole of inconsistency (hopefully).