WTF Voice — Agentic Voice Studio

Ten thousand calls. Zero humans. One voice.

A visual studio to design, deploy, and supervise a fleet of autonomous voice agents — agents with memory, tools, and knowledge that book appointments, collect payments, and escalate to humans. Meet Ananya: Indian English with natural Hinglish, self-improving call by call via Fitty intelligence.

 LIVE — Ananya is on a call right now 10,000+ calls / day ≤ 800ms voice-to-voice ₹6–10 per call 60+ gyms live
[ 01 ]  What WTF Voice is

Not IVR. Not a bot. An actual voice workforce.

WTF Voice is a fleet of real-time conversational AI agents running on streaming speech-to-text → LLM → text-to-speech pipelines with sub-second perceived latency. It handles natural interruptions, detects when the human has finished speaking, and responds like a person — not a menu.

It handles both inbound calls (prospects, members, payments, complaints) and outbound campaigns (lead qualification, renewals, win-backs, reminders) — on local +91 numbers, in production, across 60+ WTF gym locations.

The default persona, Ananya, speaks Indian English with natural code-mixed Hinglish — because that's how members actually talk. She's backed by Sarvam's saarika:v2.5 for speech recognition and bulbul:v2 for voice synthesis, with VoxCPM voice-cloning for brand-specific personas.

Real-time streaming STT→LLM→TTS Barge-in & turn detection Natural Hinglish Inbound + Outbound Local +91 DIDs Live in production
● LIVE CALL — ANANYA
LATENCY BUDGET
Transport (network)50–150ms Voice activity detection10–50ms STT (saarika:v2.5)100–250ms LLM TTFT300–500ms TTS first-audio (bulbul:v2)100–200ms Perceived voice-to-voice≤ 800ms
SARVAM · bulbul:v2
[ 02 ]  A call, in real time

It calls.
It closes.
It never sleeps.

Every conversation is a live streaming pipeline. The moment a member finishes speaking, the VAD fires, the transcription streams, the LLM generates, and audio begins playing back — all within 800 milliseconds of perceived silence.

Barge-in detection means Ananya yields the moment a member starts speaking mid-sentence — no robotic wait-your-turn. It feels like a person because the architecture demands it.

Sub-second response

Streaming STT → LLM → TTS with WebRTC transport. No polling, no buffering, no pauses.

Natural turn-taking

VAD-based barge-in lets members interrupt naturally. Ananya yields, listens, and re-engages.

Persona fidelity

VoxCPM voice-clone + Sarvam bulbul:v2 deliver a consistent, brand-calibrated voice every call.

[ 03 ]  The voice workforce, by the numbers
0
AI calls placed every single day
0
Perceived voice-to-voice latency
0
Connect rate on local +91 numbers
0
Per call vs ₹50+ for a human agent
0
Callers who rate it "sounds human"
0
Renewal uplift vs human dialers
0
Average call QA score
0
Gyms live on the engine today
[ 04 ]  Capabilities

Everything a voice team does. Automated.

// 01 · ENGINE

Real-time conversational engine

Streaming STT → LLM → TTS pipeline powered by Pipecat. Barge-in, turn detection, and latency-budget discipline baked in. Sub-800ms voice-to-voice, no exceptions.

// 02 · BUILDER

Drag-and-drop agent builder

A React Flow canvas where any operator can wire conversation nodes, branch conditions, and tool calls visually. First working bot live in under two minutes — no code required.

// 03 · OUTBOUND

Outbound campaign dialer

Pacing engine, answering-machine detection, automatic retries and callbacks. DND, DLT, TRAI, and consent pre-flight before any dial. Manages 10K+ calls a day without a single human in the loop.

// 04 · INBOUND

Inbound routing on +91 DIDs

Local numbers across every WTF brand answered instantly by published agent workflows. No hold music, no IVR trees — straight to a live conversation with Ananya or any configured persona.

// 05 · INTELLIGENCE

"Fitty" intelligence layer

Ranks the daily call list by next-best-action probability. Audits and scores every completed call automatically. Learns from outcomes to sharpen the list tomorrow.

// 06 · INTEGRATIONS

Knowledge base, payments & escalation

RAG over pgvector for live Q&A. Razorpay payment links sent mid-call. Seamless human-agent escalation with full call context, transcript, and sentiment handed off in real time.

[ 05 ]  What it runs

Every touchpoint in the member lifecycle. Handled.

WTF Voice doesn't replace a telecaller for one use case — it replaces the entire inbound and outbound calling function across every stage of the member journey.

ACQUISITION

Lead qualification & sales

Dials inbound enquiries within seconds, qualifies interest, pitches the right plan, and converts to a trial or paid membership.

RETENTION

Renewals & payment collection

Proactive renewal calls before expiry, overdue payment collection with live Razorpay link delivery, and confirmation follow-ups.

ENGAGEMENT

Reminders & check-ins

Visit reminders, BMI & re-test check-in calls, class schedule confirmations, and personalised health nudges based on member data.

RE-ACTIVATION

Win-back & birthday calls

Churned member win-back campaigns with personalised offers, birthday call sequences, and lapsed-visit re-engagement flows.

[ 06 ]  Built different

The voice-agent platform built to outrun Vapi & Retell.

No per-minute markup, no black box, no lock-in. Proprietary and owned end-to-end, and compliance hard-railed in code — not a settings panel.

// 01

Owned end-to-end

Every layer — agents, models, voice, data, infrastructure — is proprietary and built in-house. No black boxes we don't control, no per-seat tax.

// 02

India-first, Hinglish-native

Sarvam Indic speech models, local +91 numbers, code-mixed scripts — purpose-built for a billion-member market, not retrofitted.

// 03

Compliance hard-railed

TRAI TCCCPR 2018 + Feb 2025 amendment, DLT registration, DND scrubbing, and DPDP 2023 consent gates enforced in code before any dial fires.

// 04

Model-agnostic

Swap STT, LLM, or TTS providers in one config line. Sarvam, ElevenLabs, or any Pipecat-compatible model — the router adapts as the leaderboard shifts.

// 05

Multi-provider telephony

Twilio, Vonage, Plivo, IVR Solutions — all supported. Bring your own numbers, your own SIP trunk, your own redundancy strategy.

// 06

Production, not a demo

Versioned, CI-deployed, runbook-backed. 60+ gyms live, 10K+ calls a day, shipping weekly. This is not a proof of concept.

[ 07 ]  The stack

Every component chosen for latency, cost-efficiency, and India-scale. Model-agnostic at every layer — swap without touching the pipeline.

Pipecat FastAPI Next.js 15 React Flow Sarvam saarika:v2.5 STT Sarvam bulbul:v2 TTS ElevenLabs VoxCPM voice-clone pgvector RAG PostgreSQL Redis WebRTC Twilio Vonage Plivo IVR Solutions Razorpay (payment links) MCP server Python SDK TypeScript SDK Private cloud or on-prem
[ 08 ]  Part of the autonomous stack

Voice is one part of a compounding loop.

Every call surfaces intent. That intent feeds messaging, which feeds video, which drives the leads that fill tomorrow's call list. One loop. No leakage.

Ready to deploy

Put Ananya
to work today.

Deploy the autonomous voice workforce across your locations — or deploy it in your own private cloud.

PROPRIETARY · PRIVATE-CLOUD · PRODUCTION-GRADE