Ten thousand calls. Zero humans. One voice.
A visual studio to design, deploy, and supervise a fleet of autonomous voice agents — agents with memory, tools, and knowledge that book appointments, collect payments, and escalate to humans. Meet Ananya: Indian English with natural Hinglish, self-improving call by call via Fitty intelligence.
Not IVR. Not a bot. An actual voice workforce.
WTF Voice is a fleet of real-time conversational AI agents running on streaming speech-to-text → LLM → text-to-speech pipelines with sub-second perceived latency. It handles natural interruptions, detects when the human has finished speaking, and responds like a person — not a menu.
It handles both inbound calls (prospects, members, payments, complaints) and outbound campaigns (lead qualification, renewals, win-backs, reminders) — on local +91 numbers, in production, across 60+ WTF gym locations.
The default persona, Ananya, speaks Indian English with natural code-mixed Hinglish — because that's how members actually talk. She's backed by Sarvam's saarika:v2.5 for speech recognition and bulbul:v2 for voice synthesis, with VoxCPM voice-cloning for brand-specific personas.
It calls.
It closes.
It never sleeps.
Every conversation is a live streaming pipeline. The moment a member finishes speaking, the VAD fires, the transcription streams, the LLM generates, and audio begins playing back — all within 800 milliseconds of perceived silence.
Barge-in detection means Ananya yields the moment a member starts speaking mid-sentence — no robotic wait-your-turn. It feels like a person because the architecture demands it.
Sub-second response
Streaming STT → LLM → TTS with WebRTC transport. No polling, no buffering, no pauses.
Natural turn-taking
VAD-based barge-in lets members interrupt naturally. Ananya yields, listens, and re-engages.
Persona fidelity
VoxCPM voice-clone + Sarvam bulbul:v2 deliver a consistent, brand-calibrated voice every call.
Everything a voice team does. Automated.
Real-time conversational engine
Streaming STT → LLM → TTS pipeline powered by Pipecat. Barge-in, turn detection, and latency-budget discipline baked in. Sub-800ms voice-to-voice, no exceptions.
Drag-and-drop agent builder
A React Flow canvas where any operator can wire conversation nodes, branch conditions, and tool calls visually. First working bot live in under two minutes — no code required.
Outbound campaign dialer
Pacing engine, answering-machine detection, automatic retries and callbacks. DND, DLT, TRAI, and consent pre-flight before any dial. Manages 10K+ calls a day without a single human in the loop.
Inbound routing on +91 DIDs
Local numbers across every WTF brand answered instantly by published agent workflows. No hold music, no IVR trees — straight to a live conversation with Ananya or any configured persona.
"Fitty" intelligence layer
Ranks the daily call list by next-best-action probability. Audits and scores every completed call automatically. Learns from outcomes to sharpen the list tomorrow.
Knowledge base, payments & escalation
RAG over pgvector for live Q&A. Razorpay payment links sent mid-call. Seamless human-agent escalation with full call context, transcript, and sentiment handed off in real time.
Every touchpoint in the member lifecycle. Handled.
WTF Voice doesn't replace a telecaller for one use case — it replaces the entire inbound and outbound calling function across every stage of the member journey.
Lead qualification & sales
Dials inbound enquiries within seconds, qualifies interest, pitches the right plan, and converts to a trial or paid membership.
Renewals & payment collection
Proactive renewal calls before expiry, overdue payment collection with live Razorpay link delivery, and confirmation follow-ups.
Reminders & check-ins
Visit reminders, BMI & re-test check-in calls, class schedule confirmations, and personalised health nudges based on member data.
Win-back & birthday calls
Churned member win-back campaigns with personalised offers, birthday call sequences, and lapsed-visit re-engagement flows.
The voice-agent platform built to outrun Vapi & Retell.
No per-minute markup, no black box, no lock-in. Proprietary and owned end-to-end, and compliance hard-railed in code — not a settings panel.
Owned end-to-end
Every layer — agents, models, voice, data, infrastructure — is proprietary and built in-house. No black boxes we don't control, no per-seat tax.
India-first, Hinglish-native
Sarvam Indic speech models, local +91 numbers, code-mixed scripts — purpose-built for a billion-member market, not retrofitted.
Compliance hard-railed
TRAI TCCCPR 2018 + Feb 2025 amendment, DLT registration, DND scrubbing, and DPDP 2023 consent gates enforced in code before any dial fires.
Model-agnostic
Swap STT, LLM, or TTS providers in one config line. Sarvam, ElevenLabs, or any Pipecat-compatible model — the router adapts as the leaderboard shifts.
Multi-provider telephony
Twilio, Vonage, Plivo, IVR Solutions — all supported. Bring your own numbers, your own SIP trunk, your own redundancy strategy.
Production, not a demo
Versioned, CI-deployed, runbook-backed. 60+ gyms live, 10K+ calls a day, shipping weekly. This is not a proof of concept.
Every component chosen for latency, cost-efficiency, and India-scale. Model-agnostic at every layer — swap without touching the pipeline.
Voice is one part of a compounding loop.
Every call surfaces intent. That intent feeds messaging, which feeds video, which drives the leads that fill tomorrow's call list. One loop. No leakage.
WhatsApp CRM
Call outcomes, callbacks, and consent sync straight into the inbox. A self-hosted Meta Cloud API CRM — no BSP, no markup.
Video Engine
What converts on the call becomes the brief for the next batch of Reels. 100 brand videos a week, for 30 cents each.
Put Ananya
to work today.
Deploy the autonomous voice workforce across your locations — or deploy it in your own private cloud.
PROPRIETARY · PRIVATE-CLOUD · PRODUCTION-GRADE