Product walkthrough — VR karaoke + AI harmony pipeline
CircleMic is a VR music platform where users sing in immersive virtual rooms alongside personalized AI companions — each one configured from a user-uploaded 3D avatar, voice sample, and personality description, then brought to life with voice-cloned cover/harmony singing, lip-sync, and spatial behaviors.
As co-founder, I led full-stack web development and the AI song-processing pipeline, and shipped the VR client experience end to end — from device-code OAuth account linking and companion configuration all the way to in-headset lip-sync and motion behaviors. Currently in internal beta.
CircleMic is built across three tightly coupled layers — a Next.js web app for companion configuration, a GPU-backed AI song pipeline on RunPod, and a Unity VR client linked to the same account via device-code OAuth:
🌐 Web App (Next.js / TypeScript)
Account, AI companion configuration (3D avatar upload, voice sample, personality text), song library, and device-code pairing flow that links the web account to the VR headset.
🎵 AI Song Pipeline (RunPod GPU)
Demucs source separation → Whisper lyric / LRC alignment → key & vocal pitch contour extraction → in-house harmony model → RVC via Replicate API for AI-companion voice-cloned covers and harmonies.
🥽 VR Client (Unity / Quest 3)
VRM avatar runtime, AI companion lip-sync, spatial following, scripted motion behaviors, and shared singing sessions — all bound to the same account through the device-code OAuth bridge.
Each user-uploaded song fans out through a deterministic GPU pipeline so that, by the time it lands in the VR room, every companion already has its own voice-cloned cover and matching harmony lines:
🎙️ Source Separation — Demucs on RunPod
Deployed Demucs as a containerized RunPod worker to split tracks into vocals/drums/bass/other. Autoscale workers absorb upload spikes; cold-start budgeting keeps end-to-end latency predictable.
📝 Lyric Alignment — Whisper
Word-level timestamps from Whisper feed an LRC generator that the VR client streams alongside the audio for in-room synchronized lyrics.
🎼 Key & Pitch Analysis + Harmony Model
Extracted song key and vocal pitch contour, then trained a harmony model that proposes companion harmony lines aligned to the lead vocal — the musical contract that lets multiple AI companions sing with the user, not over them.
🗣️ Voice Cloning — RVC via Replicate API
Each companion is bound to a user-supplied voice sample. RVC inference runs through Replicate so cover vocals and harmony stems carry the companion's unique voice identity.
Device-Code OAuth Bridge
Designed and implemented a device-code OAuth flow between the Next.js web app and the Unity VR client — users log in once on the web, then enter a short code in VR to bind the headset to the same account, no in-headset typing or password entry.
Personalized AI Companions
Companion identity is fully user-defined: uploaded 3D avatar (VRM), voice sample (drives RVC), and a free-form personality description. The web dashboard ties these into a single companion profile that the VR client materializes at runtime.
Lip-Sync & Spatial Following
Built blendshape-based lip-sync driven by RVC-generated companion vocals plus a spatial-following controller so companions move naturally with the user inside the VR room.
Scripted Motion Behaviors
Authored a behavior layer for companions — entrance cues, idle gestures, song-section reactions — so each shared singing session feels performed, not just rendered.
🌐 Web App
Next.js + TypeScript + Prisma — account, companion configuration, library, device-code pairing
🎵 AI Song Pipeline
FastAPI + RunPod GPU workers — Demucs / Whisper / pitch / harmony / RVC
🥽 VR Client
Unity / Quest 3 — VRM companions, lip-sync, spatial following, scripted motion
📄 Technical Whitepaper
System architecture, API contracts, and pipeline data flow for internal beta