Chen Ruolan - CircleMic

Overview

CircleMic is a VR music platform where users sing in immersive virtual rooms alongside personalized AI companions — each one configured from a user-uploaded 3D avatar, voice sample, and personality description, then brought to life with voice-cloned cover/harmony singing, lip-sync, and spatial behaviors.

As co-founder, I led full-stack web development and the AI song-processing pipeline, and shipped the VR client experience end to end — from device-code OAuth account linking and companion configuration all the way to in-headset lip-sync and motion behaviors. Currently in internal beta.

Built an AI companion workflow connecting user-uploaded songs, voice samples, VRM avatar assets, and personality profiles into a personalized VR karaoke experience. Orchestrated a multi-step AI media pipeline using FastAPI, Azure, RunPod, and Replicate to automate source separation, lyric transcription, pitch extraction, harmony generation, and RVC voice rendering.
Designed a RAG-based personality layer that retrieves user preferences, character settings, song context, and prior interactions to generate context-aware dialogue and stage behaviors.
Developed web-to-VR account linking and companion configuration flows, including device-code OAuth, personalized companion loading, lyric display, and Unity-based interaction logic.

System Architecture

CircleMic is built across three tightly coupled layers — a Next.js web app for companion configuration, a GPU-backed AI song pipeline on RunPod, and a Unity VR client linked to the same account via device-code OAuth:

🌐 Web App (Next.js / TypeScript)
Account, AI companion configuration (3D avatar upload, voice sample, personality text), song library, and device-code pairing flow that links the web account to the VR headset.

🎵 AI Song Pipeline (RunPod GPU)
Demucs source separation → Whisper lyric / LRC alignment → key & vocal pitch contour extraction → in-house harmony model → RVC via Replicate API for AI-companion voice-cloned covers and harmonies.

🥽 VR Client (Unity / Quest 3)
VRM avatar runtime, AI companion lip-sync, spatial following, scripted motion behaviors, and shared singing sessions — all bound to the same account through the device-code OAuth bridge.

AI Song-Processing Pipeline

Each user-uploaded song fans out through a deterministic GPU pipeline so that, by the time it lands in the VR room, every companion already has its own voice-cloned cover and matching harmony lines:

🎙️ Source Separation — Demucs on RunPod
Deployed Demucs as a containerized RunPod worker to split tracks into vocals/drums/bass/other. Autoscale workers absorb upload spikes; cold-start budgeting keeps end-to-end latency predictable.

📝 Lyric Alignment — Whisper
Word-level timestamps from Whisper feed an LRC generator that the VR client streams alongside the audio for in-room synchronized lyrics.

🎼 Key & Pitch Analysis + Harmony Model
Extracted song key and vocal pitch contour, then trained a harmony model that proposes companion harmony lines aligned to the lead vocal — the musical contract that lets multiple AI companions sing with the user, not over them.

🗣️ Voice Cloning — RVC via Replicate API
Each companion is bound to a user-supplied voice sample. RVC inference runs through Replicate so cover vocals and harmony stems carry the companion's unique voice identity.

VR Client Experience

Device-Code OAuth Bridge
Designed and implemented a device-code OAuth flow between the Next.js web app and the Unity VR client — users log in once on the web, then enter a short code in VR to bind the headset to the same account, no in-headset typing or password entry.

Personalized AI Companions
Companion identity is fully user-defined: uploaded 3D avatar (VRM), voice sample (drives RVC), and a free-form personality description. The web dashboard ties these into a single companion profile that the VR client materializes at runtime.

Lip-Sync & Spatial Following
Built blendshape-based lip-sync driven by RVC-generated companion vocals plus a spatial-following controller so companions move naturally with the user inside the VR room.

Scripted Motion Behaviors
Authored a behavior layer for companions — entrance cues, idle gestures, song-section reactions — so each shared singing session feels performed, not just rendered.

Deliverables

🌐 Web App
Next.js + TypeScript + Prisma — account, companion configuration, library, device-code pairing

🎵 AI Song Pipeline
FastAPI + RunPod GPU workers — Demucs / Whisper / pitch / harmony / RVC

🥽 VR Client
Unity / Quest 3 — VRM companions, lip-sync, spatial following, scripted motion

📄 Technical Whitepaper
System architecture, API contracts, and pipeline data flow for internal beta