W×W
AUSTIN, TX --:--:-- CST
← ALL WORK

[ LIVE ] JUN 2026

Podcast Production Pipeline

PodcastClaude · ElevenLabsAutomation
~30 minVoice note → live
AIVoice clone
AutoSocial clips

Problem

Producing a podcast episode normally means writing a script, recording clean audio, editing, building show notes, publishing across platforms, and cutting social clips by hand. Each step is its own task and its own bottleneck. For a personal podcast I wanted the friction gone entirely. I wanted to capture an idea the moment I had it, talking loosely without worrying about delivery or environment, and have everything downstream happen automatically.

Approach

I split the work into two pipelines: one to produce and publish the episode, one to generate social clips. The first decision was to keep my actual recording effort to near zero. I speak into a voice note freeform, jumping around, with an air conditioner running or sirens in the background, because nothing past the transcript depends on audio quality. From there the constraint moved to consistency, so I built guardrails into a Claude Code project that define episode structure, focus, tone, and voice, anchored with examples of correct scripts. That lets the script generation stay on-voice without me steering it each time.

What I shipped

A two-stage automated system. Stage one: a voice memo transcript feeds into Claude Code, which generates a structured script under the guardrails. The script goes to ElevenLabs, trained on two hours of my voice, which reads it in a clone of me. A Suno theme song sits under intro text dictated by the guardrails. The system exports an MP3 and a VTT transcript, then a publishing pipeline writes the audio into the podcast XML feed, builds a website page, attaches the transcript, summarizes the episode into show notes with timestamps pulled from the VTT, and pushes it all live to the website and every podcast platform. Stage two: a clip command scans each episode and pulls five clips (three short, one medium, one long), cutting cleanly at thought boundaries. Each clip becomes a video with a branded graphic, VTT-driven subtitles, the podcast title and logo, and an animated waveform synced to the audio.

Outcome

I go from speaking into a voice note to five social clips linking to a live episode in about thirty minutes, with no manual editing, recording retakes, or platform-by-platform publishing.