Portfolio Project

I built an AI-powered children's storytelling app from zero to production.

Choo Choo Story Time generates personalized, narrated bedtime stories in under 90 seconds by orchestrating 4 AI services from a single parent input. Solo-built as founder and full-stack engineer.

SwiftUI FastAPI Google Gemini ElevenLabs TTS Supabase Railway Claude Code Python
See How It Works ↓
About Me
Solo Founder & Full-Stack AI Engineer

I'm Viktor Zhai. I designed, built, and shipped Choo Choo Story Time — a personalized children's storytelling app that turns a parent's 30-second behavioral description into a fully narrated, illustrated bedtime story. Every layer: product vision, iOS frontend, Python backend, AI prompt engineering, deployment, and content safety.

My approach: ship fast, measure, iterate. I used Claude Code as my AI pair-programming partner throughout development — not just for code generation, but for architecture decisions, prompt iteration, and automated content pipelines. I treat AI as a force multiplier, not a crutch.

The app is live on TestFlight with a production backend on Railway. I also built an automated YouTube content pipeline that produces 4K story videos end-to-end using AI (script, artwork, narration, music, video assembly).

4
AI services orchestrated per story (Gemini, ElevenLabs, Imagen, Moderation)
~4,500
Lines of modular prompt engineering across 7 composable prompt files
8
Stage async pipeline with progress tracking, fallback, and recovery
EN / ZH / ES / HI
Multilingual story generation with language-aware voice selection
The Problem
Parents need help in the moment, not a textbook.

The Scenario

It's 7:30 PM. A 4-year-old just threw a block and yelled "NO!" during cleanup. The parent is stressed, tired, and has 10 minutes before bedtime. They don't want a parenting article — they want something that works right now and turns this rough moment into a calm bedtime.

Instant Guidance

RAG-powered parenting guidance card in 3–5 seconds. Actionable strategies based on the specific behavior, not generic advice.

📖

Personalized Story

A bedtime story where the child's own toy is the hero, teaching the lesson through action — not a lecture. Ready in 60–90 seconds.

🎧

Full Audio Experience

AI-narrated audio with background music and hand-drawn cover art. The parent can just press play and breathe.

System Architecture
Split-Latency AI Pipeline
The core insight: parents need two things at different speeds. Instant guidance (3–5s synchronous path) and a bedtime story (60–90s async pipeline). One input, two AI outputs, two speed lanes.
End-to-End System Flow
iOS App (SwiftUI + SwiftData)
Parent describes challenging moment: behavior, context, reaction, time pressure (progressive disclosure via Ghost Zone toggle)
MomentInputViewModel validates input, tracks analytics event via Mixpanel
Supabase Edge Function — Fast Path
POST /generate-structured-response with JWT auth + rate limiting
RAG retrieval over internal parenting_scripts corpus
Returns bilingual (EN/ZH) structured guidance card
3–5 sec — Parent has actionable guidance. Can stop here.
Supabase
PostgreSQL + pgvector for RAG
Storage for audio & cover art
Auth (JWT token validation)
(parent chooses to generate story)
iOS — Story Preview & Customization
StorySummaryViewModel — parent edits tone, moral, setting, length, language
StoryAPIClient.triggerStoryGeneration() dispatches structured payload
Edge Function
POST /trigger-story-generation
INSERT job row (status: pending)
Forward to FastAPI worker
Return jobId instantly (<500ms)
FastAPI Worker (Railway) — 8-Stage AI Pipeline
1 Personalization
Normalize input, infer setting, map growth focus
2 Prompt Assembly
Master Prompt V2: 7 files → ~3,500 words
3 Story Generation
Gemini 330–770 words, temp 0.9
4 Content Moderation
Gemini Dual-layer safety check
5 TTS Synthesis
ElevenLabs Language-aware voice
6 BGM Mixing
pydub — mood-matched music blend
7 Cover Art
Imagen 3 Hand-drawn illustration
8 Upload & Finalize
Supabase assets → public URLs
Resilience: Exponential retry with model fallback (gemini-3-pro → gemini-2.5-pro). TTS, cover art, and BGM fail independently without blocking story delivery.
iOS — Progress Tracking & Recovery
StoryGenerationManager polls status every 2s with progress smoothing (eased interpolation, caps at 98%)
Job persistence: active jobs saved locally, resumes on foreground return, survives app suspension
On completion: save to SwiftData library, fire local notification, display player with cover art
60–90 sec total — Personalized narrated story with cover art, ready to play.
AI Deep Dive
What I Actually Built with AI
This isn't a wrapper around a single API call. Each AI integration required domain-specific engineering to produce child-safe, educationally sound, production-quality output.

1. Master Prompt System V2 — Modular Prompt Engineering

Stories are not generated from a monolithic prompt. I built a dynamic prompt builder that composes the final ~3,500-word prompt from 7 modular files at request time. This means changing a safety rule touches one file, not four templates. Each module is independently testable and version-controlled.

Core Principles
Toy-as-hero philosophy
U.S. kindergarten teacher tone
Safety constraints (non-negotiable)
Show, don't tell
~400 lines • All stories
Age Specifications
3–5: simple sentences, concrete, repetition
6–8: complex emotions, cause & effect
~985 / ~768 lines • By child age
Story Templates
Academic: Hook → Problem → Learn → Win
SEL: Trigger → Feel → Strategy → Win
4 templates • By type + age
User Variables
Cast (hero toy + supporting)
Child name, age, pronouns
Moment & growth focus
Setting, language
Injected per request

Assembly: Core_Principles + Age_{3-5|6-8}_Specs + Template_{type}_{age} + Variables → Gemini API

2. Dual-Layer Content Safety

Every generated story passes through two independent safety layers before reaching a child. Layer 1: prompt-level constraints baked into Core_Principles (violence, separation anxiety, age-appropriateness). Layer 2: a separate Gemini moderation pass that validates the generated output against child safety criteria. Stories that fail moderation are rejected and regenerated — never delivered.

Why two layers? Prompt constraints are necessary but not sufficient. LLMs can drift. The post-generation moderation pass catches edge cases that slipped through prompt instructions. This is the same defense-in-depth pattern used in production content moderation systems.

3. RAG-Powered Parenting Guidance

The fast path (3–5 second response) uses Retrieval-Augmented Generation over a curated corpus of parenting strategies. When a parent describes a behavioral challenge, the Supabase Edge Function retrieves relevant strategy documents via vector similarity search, then generates a structured, bilingual guidance card grounded in the retrieved context.

This isn't generic chatbot output. The RAG corpus is curated from evidence-based parenting frameworks, and the retrieval ensures the guidance is specific to the described behavior. The bilingual output (English + Mandarin Chinese) serves our primary user base.

4. Intelligent Audio Pipeline

The TTS stage isn't a simple text-to-speech call. The system selects voices based on language, tone (calm/energetic/playful), and narrator gender. ElevenLabs generates the narration, then pydub handles audio processing: volume normalization, background music mixing with mood-matched tracks, and final MP3 encoding. Voice selection is configurable per-language with fallback chains.

I also built a separate YouTube content pipeline that produces full 4K story videos using ElevenLabs narration with forced-alignment page synchronization — the audio timing drives the visual page turns automatically.

5. Automated YouTube Story Video Pipeline

Beyond the app, I built a fully automated pipeline that produces publish-ready YouTube story videos. Each video goes through 5 stages, all orchestrated by AI:

1
Script AI-written story script with character bible and page breaks
2
Artwork 20 4K illustrations generated per story with consistent character design
3
Narration ElevenLabs TTS with chunked generation for long-form audio
4
Sync Forced-alignment maps audio timestamps to page transitions
5
Export BGM remix + final 4K MP4 with Ken Burns effects

6. AI-Augmented Development with Claude Code

I used Claude Code as an AI pair-programming partner throughout the entire project — not just for code completion, but as a collaborator across the full development lifecycle:

Architecture design: Planned the split-latency system, Edge Function → Worker pattern, and prompt modularization strategy through iterative conversation. Implementation: Built features across SwiftUI, FastAPI, and infrastructure with Claude handling boilerplate while I focused on business logic and product decisions. Prompt iteration: Refined the Master Prompt System through dozens of generation → evaluate → revise cycles. Custom agents & skills: Built specialized Claude Code agents for story writing, video production, code review, and QA — each with domain-specific instructions.

The key insight: AI tools are most powerful when you bring strong opinions about what to build. Claude accelerated the how, but every product decision, safety constraint, and architecture tradeoff was mine.

Request Shape
What Gets Sent to the Pipeline
A representative payload showing the structured data that flows from the iOS app through the Edge Function to the FastAPI worker. Every field maps to a prompt assembly decision.
// POST /trigger-story-generation { "summary_id": "a3f8c1d0-...", "story_entry_type": "moment", "guidance_data": { "behavior": "Threw the block and shouted 'No!'", "context": "Cleanup before breakfast", "what_tried": "Repeated instruction twice", "time_pressure": "Need to leave in 10 minutes", "other_factors": "Poor sleep last night" }, "child_info": { "name": "Maya", "age": 4, "gender": "they/them" }, "structured_cast": [ { "kind": "child", "name": "Maya" }, { "kind": "character", "name": "T-Rex", "type": "toy", "source": "user_selected" } ], "setting": { "name": "Home", "category": "domestic" }, "style": "calm", "age_group": "3-5", "length": "3-4 min", "language": "en", "story_type": "character_building", "growth_focus": "cooperation" }
Impact
Tangible Results
3–5s
Time-to-first-value: RAG guidance card before story generation starts
60–90s
End-to-end: parent input to narrated audio story with cover art
~30s
Parent input time via progressive disclosure UI
100%
Dual-layer content moderation: prompt constraints + post-generation validation
4
AI services orchestrated per story: generation, moderation, TTS, illustration
7
Composable prompt files (~4,500 lines) dynamically assembled per request
Key Architecture Decisions

Split-Latency Design

Parents in crisis need help now. The synchronous RAG path gives useful output in 3–5 seconds. The async pipeline runs in background. Two AI outputs from one input — optimized for the parent's emotional state.

Edge Function → Worker Pattern

iOS gets an instant jobId (<500ms) instead of holding HTTP for 90s. Enables background recovery, progress polling, and graceful degradation if any stage fails.

Toy-as-Hero (Not the Child)

Research-backed observational learning (Paw Patrol, Peppa Pig pattern). The toy teaches through action — therapeutic distance lets kids process difficult emotions without feeling lectured. Embedded in prompt architecture.

Modular Prompts Over Monolithic

2 age groups × 2 story types = 4 combos, but core principles and age specs are reused. Changing a safety rule touches one file. Each module is independently testable and version-controlled.

Tech Stack
Built With

SwiftUI + SwiftData

iOS app with MVVM, async/await, local persistence, Mixpanel & Sentry analytics

FastAPI + SQLAlchemy

Python backend with Alembic migrations, async pipeline orchestration, deployed on Railway

🤖

Google Gemini + Imagen

Story generation, content moderation, cover art illustration with model fallback chains

🎤

ElevenLabs + pydub

Multilingual TTS with voice selection, BGM mixing, forced alignment for video sync

📚

Supabase

PostgreSQL + pgvector for RAG, Edge Functions, JWT auth, file storage

🚀

Railway

Production deployment with auto-deploy from main branch, env management

💡

Claude Code

AI pair-programming partner for architecture, implementation, prompt engineering, and custom agents

🎬

FFmpeg + Python

Automated 4K YouTube video pipeline with Ken Burns effects and page-sync narration