I build AI-native education products from a real problem I face as a dad.
As a dad, I build educational tools to help parents raise thoughtful, resilient kids in the age of AI.
I built Hey Kiddo and Story Time by EE for children ages 3–9 after seeing the same need at home
and in my child's school: parents and teachers need better ways to turn everyday challenges into
meaningful lessons.
Technically, the product runs on a split-latency educational agent: a Supabase PostgreSQL
+ pgvector retrieval path grounds the parent guidance card in 3–5 seconds, while an async multimodal
runtime coordinates Gemini, ElevenLabs, Imagen, FastAPI, and SwiftUI to deliver a complete story-based
learning experience in under 90 seconds.
Why this stands out: this was not a thesis project or generic AI demo. I built these products
from a real problem I see as a dad, and refined them against the environments where young children actually learn:
at home and in school. The product goal is agency for parents and teachers, not dependency on AI.
Educational Agent Overview
What the educational agent owns end-to-end.
This case study starts from direct lived need. I built Hey Kiddo and Story Time by EE for children ages
3–9 after seeing the same learning problem at home and in my kid's school: adults need fast,
grounded help turning difficult moments into teachable ones. The system interprets a family moment,
grounds its response with pgvector-backed retrieval, converts that moment into a teachable objective,
and delivers a story that helps adults teach cooperation, emotional regulation, empathy, and resilience.
Why This Problem Is Real
I'm a dad building for a real education problem I saw both at home and in my child's school.
The product is designed for ages 3–9, where narration, imagery, repetition, and emotional framing matter more than abstract instruction.
Translate messy parent input into a structured educational task the runtime can act on.
Retrieve grounded parenting context and return immediate guidance before long-running generation begins.
Map a stressful family moment into a concrete learning goal the parent or teacher can reinforce through storytelling.
Generate age-appropriate stories where the lesson is taught through narrative action instead of direct instruction.
Orchestrate narration, music, illustration, and delivery so the adult receives a usable teaching tool, not just text output.
Persist job state so the educational experience survives retries, app suspension, and partial failures.
Fast runtime handoff keeps the product responsive while the educational experience is assembled in the background.
Home + School
The product thesis comes from real caregiver and school-adjacent learning contexts, not hypothetical use cases.
Structured
The runtime moves from structured input to structured guidance to multimodal output, instead of behaving like a generic chat surface.
Agency
The system is designed to empower parents and teachers to teach better, not replace them with an automated answer machine.
Learning Thesis
The value is not content completion. The value is helping parents turn difficult moments into teachable stories that build lasting lessons and stronger agency.
Agent Runtime
Perceive, retrieve, decide, act, verify, recover.
The product experience depends on an educational agent runtime with two lanes: a pgvector-backed fast guidance
lane for the parent and a longer multimodal execution lane that turns the same moment into a complete
storytelling-based lesson.
Agent Control Loop
User + Client
Parent describes a stressful moment; the client captures structure, validates input, and packages the context for an educational response.
The app starts tracking only when the parent asks the agent to turn guidance into a complete story-based lesson.
↓
Grounding + Policy
JWT auth, rate limiting, embeddings lookup, and pgvector semantic retrieval over parenting_scripts establish safe, grounded educational context.
Gemini returns structured bilingual guidance, giving the parent immediate coaching before the storytelling flow begins.
3–5 sec grounded guidance. The parent can stop here or continue into a story-led lesson.
Data Layer
PostgreSQL + pgvector for retrieval
Storage for audio, images, and derived assets
Job table for status, retries, and public URLs
↓
Decision + Handoff
If the parent wants a full story, the runtime writes a pending job row and returns jobId in under 500ms.
Execution moves off-request so the agent can keep building the lesson without blocking the client connection.
Agent Runtime (Railway) — 8 Stages
1Interpret Input
Normalize context, infer intent, and map to a lesson focus
2Build Runtime Context
Assemble prompt modules, learning objective, request variables, and tool config
3Generate Story
Gemini Story generation around the target lesson
4Verify Safety
Gemini Policy and moderation pass
5Narrate
ElevenLabs Language-aware voice
6Compose Audio
pydub music blend + normalization
7Illustrate
Imagen 3 Hand-drawn cover art
8Finalize State
Supabase assets → public URLs
Resilience: Fallbacks, retries, and stage isolation keep the agent moving when one model or asset step degrades.
↓
State + Recovery
StoryGenerationManager polls every 2s, smooths progress, and keeps the UI aligned with the agent's actual state.
Active jobs are saved locally and resumed after app suspension or foreground return.
Completed assets are stored in SwiftData and surfaced with notification + playback UI.
60–90 sec total for full story, audio, and cover-art delivery.
Tool-Using Agent
The runtime does more than generate text. It coordinates retrieval, safety, narration, audio, illustration, and delivery as one educational job.
Fast Coaching + Deep Lesson
Grounded parent guidance arrives immediately through pgvector-backed retrieval, while a longer multimodal flow turns the same moment into a complete story-based teaching experience.
Agency Over Dependency
The goal is to strengthen the parent's ability to teach through stories, not to create passive dependence on AI-generated advice.
Educational Behaviors
The value comes from learning design, not just model calls.
The hard part was not API access. It was building a runtime that can interpret intent, frame a lesson,
use tools, enforce guardrails, and finish the task even when individual services wobble.
Interpretation + Learning Design
The runtime turns ambiguous parent input into a structured educational task instead of sending raw prose straight into generation.
Normalizes behavior, context, pressure, and growth goals into agent-readable structure.
Builds runtime context from modular prompt files, learning goals, and request variables.
Supports request-time personalization without maintaining separate hard-coded lesson flows.
Keeps the system adaptable as new developmental goals, tones, and story modes are added.
Grounding + Guardrails
The fast lane is grounded by retrieval, and the slow lane is checked again before anything reaches the family.
Supabase PostgreSQL + pgvector powering semantic retrieval over a curated parenting guidance corpus.
Structured bilingual response instead of raw chatbot prose.
Prompt-level safety constraints plus second-pass Gemini moderation.
Unsafe or developmentally weak generations are rejected and regenerated before delivery.
Storytelling as Instruction
The runtime coordinates multiple specialized tools to turn a lesson objective into an engaging story-based teaching artifact.
ElevenLabs voice selection by language, tone, and narrator style.
pydub normalization and background music mixing.
Imagen 3 cover art generation for each completed story.
Separate video pipeline publishes long-form educational story content to YouTube.
The runtime is designed to recover cleanly when models or assets fail, so the lesson still reaches the parent.
Model fallback chain from gemini-3-pro to gemini-2.5-pro.
TTS, cover art, and BGM failures are isolated instead of fatal.
Local job persistence and notification-based completion recovery on iOS.
Production deployment runs across Supabase and Railway with live asset storage.
Agent Stack
Components behind the runtime.
The stack is organized around the agent job model: client capture, vector-grounded retrieval, execution,
multimodal tool calls, durable state, and production operations.
CLIENT
SwiftUI + SwiftData
MVVM client, async/await, local persistence, progress UI, Mixpanel, and Sentry.
EDGE
Supabase Edge Functions
JWT validation, rate limiting, embeddings-based semantic search with pgvector, and fast runtime handoff.
RUNTIME
FastAPI + SQLAlchemy
Stateful async orchestration, state transitions, retries, multi-step tool execution, and public asset finalization.
TOOLS
Gemini + Imagen + ElevenLabs
Lesson planning and story generation, moderation, illustration, and multilingual narration.