Podcast Production Pipeline

PodcastClaude · ElevenLabsAutomation

I built an end-to-end pipeline that turns a freeform voice memo into a fully published podcast episode plus five ready-to-share social clips. It takes about thirty minutes per episode.

I speak unscripted into a voice note, and the transcript flows through a chain of AI tools: Claude Code with guardrails writes the script in my structure and voice, ElevenLabs reads it back in a clone of my own voice, and an automated pipeline assembles the audio, generates show notes with timestamps, and publishes the episode live to the website, Apple, Spotify, and Amazon.

A separate clip command then finds five social cuts, lays them over branded video with subtitles and an animated waveform, and outputs them ready to post. The whole thing runs from a babbled voice note, background noise and all, to live distribution without manual editing.

~30 minVoice note → live

AIVoice clone

AutoSocial clips

Problem

Producing a podcast episode normally means writing a script, recording clean audio, editing, building show notes, publishing across platforms, and cutting social clips by hand. Each step is its own task and its own bottleneck. For a personal podcast I wanted the friction gone entirely. I wanted to capture an idea the moment I had it, talking loosely without worrying about delivery or environment, and have everything downstream happen automatically.

Approach

I split the work into two pipelines: one to produce and publish the episode, one to generate social clips. The first decision was to keep my actual recording effort to near zero. I speak into a voice note freeform, jumping around, with an air conditioner running or sirens in the background, because nothing past the transcript depends on audio quality. From there the constraint moved to consistency, so I built guardrails into a Claude Code project that define episode structure, focus, tone, and voice, anchored with examples of correct scripts. That lets the script generation stay on-voice without me steering it each time.

What I shipped

A two-stage automated system. Stage one: a voice memo transcript feeds into Claude Code, which generates a structured script under the guardrails. The script goes to ElevenLabs, trained on two hours of my voice, which reads it in a clone of me. A Suno theme song sits under intro text dictated by the guardrails. The system exports an MP3 and a VTT transcript, then a publishing pipeline writes the audio into the podcast XML feed, builds a website page, attaches the transcript, summarizes the episode into show notes with timestamps pulled from the VTT, and pushes it all live to the website and every podcast platform. Stage two: a clip command scans each episode and pulls five clips (three short, one medium, one long), cutting cleanly at thought boundaries. Each clip becomes a video with a branded graphic, VTT-driven subtitles, the podcast title and logo, and an animated waveform synced to the audio.

Outcome

I go from speaking into a voice note to five social clips linking to a live episode in about thirty minutes, with no manual editing, recording retakes, or platform-by-platform publishing.

← ALL WORK START A PROJECT →