ismaelramos.dev

← Case studies

Embedding AI into a global scholarship platform

Shipping AI features — an authoring assistant for scholarship calls, a guidance chatbot for applicants, automated post and course generation — into a platform serving millions of users, without turning the product into a wrapper around an LLM.

Context

By 2024, the scholarship platform had grown beyond its original scope: scholarships, courses, video podcasts, short-form video, multiple countries, multiple authoring teams. The bottleneck was no longer engineering throughput — it was content velocity. Authors inside Santander were creating scholarship calls (“convocatorias”), course descriptions and supporting content faster than the editorial process could keep up, while applicants on the other side needed more guidance than the static help pages could provide. AI was the obvious leverage. The harder question was how to embed it so it amplified the existing product rather than replacing it.

Constraints

Bank-grade product, so any AI surface had to be reviewable, auditable and reversible. Multi-language and multi-country: outputs had to respect locale-specific conventions, regulated copy and translated UX. Couldn’t compromise the Core Web Vitals gains — AI surfaces had to load lazily and stay off the critical path. Couldn’t compromise accessibility — generated content has to meet the same semantic and screen-reader bar as authored content. Output quality had to be trusted by editorial teams, not just by engineers.

Decisions

Treat AI features as augmentation, not replacement: every AI surface produces a draft that a human reviews, never a final output that bypasses editorial. Build them as feature modules that follow the existing feature-boundary architecture — no special-snowflake AI folder outside the normal structure. Ship the chat and assistant UI as components in the LitElement library so they’re consistent, accessible and reusable across surfaces. Make the LLM a backend concern: the frontend talks to internal APIs, not to providers directly. Cache aggressively and stream responses for perceived performance.

Implementation

Authoring assistant for scholarship calls — a side panel inside the AEM authoring flow that drafts call descriptions, eligibility criteria and FAQs from a short structured brief; the author edits and approves before publish. Guidance chatbot for applicants — a conversational surface that walks users through choosing the right scholarship and completing the application, integrated with the existing application flow and analytics. Automated post and course generation — pipelines that produce draft articles and course outlines that editorial then refines; never auto-publishes. Every AI surface is a Web Component from the shared library; every interaction is instrumented for analytics. Streaming responses for the assistant and chatbot so first token shows up fast; cached templated responses for common chatbot queries so we’re not paying for tokens on every “how do I apply?”

Outcome

Authoring teams produce scholarship calls and supporting content meaningfully faster, with editorial review still in the loop. The chatbot handles a substantial slice of guidance questions that would otherwise hit support or be abandoned. Generated content meets the same accessibility and structural bar as human-authored content because it ships through the same components. The AI surface stayed off the critical path — CWV held steady through the rollout.

Outcome metrics

Performance
AI surfaces lazy-loaded; CWV held through rollout
Developer experience
AI features live inside the standard feature-boundary architecture
Scale
Shipped into a platform serving 7M+ users across 20+ countries
AI integration
Authoring assistant, applicant chatbot, content generation pipelines

Tradeoffs

We chose draft-then-review over autonomous generation. Faster pipelines are technically possible — and tempting on a content-heavy product — but the editorial trust required for autonomous publishing on a bank-regulated surface isn’t there yet, and earning it requires draft-then-review as the on-ramp. Anything that bypasses editorial is a regression, not a feature.

We chose internal API mediation over direct LLM access from the client. Calling providers directly would have been simpler to ship and lighter on the backend, but it would have made cost control, prompt management, content filtering and provider switching impossible to centralise. The internal API is the lever — it stays.

We deliberately did not build a generic “AI everywhere” surface. Each AI feature targets a concrete bottleneck (authoring speed, applicant guidance, content scale-out). Generic assistants tend to be impressive in demos and underused in production; targeted assistants are the opposite.

Engineering challenges

  • Streaming UX — making partial responses feel intentional rather than jittery, especially in the chatbot where users expect conversational pacing
  • Cost control — caching templated responses, batching where possible, and aggressively measuring per-feature token spend
  • Provider portability — the backend abstracts the LLM so switching models or providers doesn’t require frontend changes
  • Accessibility of generated content — generated articles get the same semantic structure pass as human-written ones; generated UI elements get the same focus and ARIA treatment
  • Multi-language quality — locale-specific prompts and review processes, not a one-size-fits-all English-first approach

What I learned

AI features are product work, not a separate technical track. Treating them like any other feature — same architecture, same components, same review process, same analytics — is what kept them from drifting into a corner of the codebase nobody else maintains. The most expensive AI mistake on platforms like this is letting AI features live outside the engineering norms.

The other lesson: latency UX matters more than model choice. A faster, cheaper model with good streaming and caching often beats a slower, premium model — especially for the chatbot, where users abandon if the first token takes more than a second.

What I would do differently

I’d invest in evals from day one. We measured product outcomes (authoring speed, chatbot resolution rate) from the start, but we under-invested in offline quality evals for the generation pipelines. That meant some quality drift went unnoticed until editorial flagged it. Evals are cheap and pay back constantly — should be in the foundation, not added later.

Future evolution

The next steps are richer authoring assistance (suggesting eligibility criteria from a corpus of past calls, flagging compliance issues before publish), a more deeply integrated applicant journey (the chatbot completing parts of the application form with the user’s confirmation), and AI-assisted internal tooling for the engineering team itself — code review, architecture exploration, and PR comment workflows. Some of this work is reflected in the PR-comments-to-plan workflow on the blog.

Principles applied

  • AI as augmentation, never as replacement
  • AI features live inside the standard architecture, not outside it
  • Draft-then-review until editorial trust is earned
  • Mediate providers through internal APIs — keep the lever
  • Generated content meets the same accessibility bar as authored content

This case study is the most recent thread of the platform’s evolution. It builds on the feature-boundary architecture (AI features fit inside it cleanly), the LitElement component library (AI surfaces ship as accessible components), the CWV rebuild (AI surfaces stay off the critical path) and the accessibility work (generated content inherits the same primitives).