Context
Constraints
Decisions
Implementation
Outcome
Outcome metrics
- Performance
- AI surfaces lazy-loaded; CWV held through rollout
- Developer experience
- AI features live inside the standard feature-boundary architecture
- Scale
- Shipped into a platform serving 7M+ users across 20+ countries
- AI integration
- Authoring assistant, applicant chatbot, content generation pipelines
Tradeoffs
We chose draft-then-review over autonomous generation. Faster pipelines are technically possible — and tempting on a content-heavy product — but the editorial trust required for autonomous publishing on a bank-regulated surface isn’t there yet, and earning it requires draft-then-review as the on-ramp. Anything that bypasses editorial is a regression, not a feature.
We chose internal API mediation over direct LLM access from the client. Calling providers directly would have been simpler to ship and lighter on the backend, but it would have made cost control, prompt management, content filtering and provider switching impossible to centralise. The internal API is the lever — it stays.
We deliberately did not build a generic “AI everywhere” surface. Each AI feature targets a concrete bottleneck (authoring speed, applicant guidance, content scale-out). Generic assistants tend to be impressive in demos and underused in production; targeted assistants are the opposite.
Engineering challenges
- Streaming UX — making partial responses feel intentional rather than jittery, especially in the chatbot where users expect conversational pacing
- Cost control — caching templated responses, batching where possible, and aggressively measuring per-feature token spend
- Provider portability — the backend abstracts the LLM so switching models or providers doesn’t require frontend changes
- Accessibility of generated content — generated articles get the same semantic structure pass as human-written ones; generated UI elements get the same focus and ARIA treatment
- Multi-language quality — locale-specific prompts and review processes, not a one-size-fits-all English-first approach
What I learned
AI features are product work, not a separate technical track. Treating them like any other feature — same architecture, same components, same review process, same analytics — is what kept them from drifting into a corner of the codebase nobody else maintains. The most expensive AI mistake on platforms like this is letting AI features live outside the engineering norms.
The other lesson: latency UX matters more than model choice. A faster, cheaper model with good streaming and caching often beats a slower, premium model — especially for the chatbot, where users abandon if the first token takes more than a second.
What I would do differently
I’d invest in evals from day one. We measured product outcomes (authoring speed, chatbot resolution rate) from the start, but we under-invested in offline quality evals for the generation pipelines. That meant some quality drift went unnoticed until editorial flagged it. Evals are cheap and pay back constantly — should be in the foundation, not added later.
Future evolution
The next steps are richer authoring assistance (suggesting eligibility criteria from a corpus of past calls, flagging compliance issues before publish), a more deeply integrated applicant journey (the chatbot completing parts of the application form with the user’s confirmation), and AI-assisted internal tooling for the engineering team itself — code review, architecture exploration, and PR comment workflows. Some of this work is reflected in the PR-comments-to-plan workflow on the blog.
Principles applied
- AI as augmentation, never as replacement
- AI features live inside the standard architecture, not outside it
- Draft-then-review until editorial trust is earned
- Mediate providers through internal APIs — keep the lever
- Generated content meets the same accessibility bar as authored content
Related work
This case study is the most recent thread of the platform’s evolution. It builds on the feature-boundary architecture (AI features fit inside it cleanly), the LitElement component library (AI surfaces ship as accessible components), the CWV rebuild (AI surfaces stay off the critical path) and the accessibility work (generated content inherits the same primitives).