TL;DR
Alongside Phorest's existing FrontDesk AI, we built an MVP booking agent using ElevenLabs and a custom MCP server to connect it to the Phorest salon API. The agent supports both voice calls and WhatsApp, letting clients book appointments through natural conversation, looking up their profile, browsing services, checking availability, and confirming a booking.
The Problem
Phorest already has FrontDesk AI, our AI-powered assistant that handles client interactions via chat. Adjacent to that, we wanted to explore what a voice-first booking experience could look like.
Phone bookings still tie up front-of-house staff, and for clients, calling means waiting on hold or navigating menus. Could an AI agent handle the full booking flow, understanding what the client wants, finding availability, and making the booking, through a natural conversation, whether by voice call or WhatsApp message?
The Solution: Voice AI + MCP + Phorest API
Using the same Phorest API that powers FrontDesk AI, we built a separate interface using three layers:
- ElevenLabs, powers both the voice agent (STT, LLM reasoning, TTS) and the WhatsApp integration (text-based conversational AI)
- MCP Server, our custom glue layer exposing Phorest API actions as tools the AI agent can call
- Phorest API, the salon backend (clients, services, availability, bookings)
MCP (Model Context Protocol) is the standard that lets AI agents call external tools. We built a server that wraps all the relevant Phorest endpoints as callable tools, so the agent can handle the same core booking workflows as FrontDesk AI, whether the client is on a call or messaging on WhatsApp.
Key Challenges
Getting the transport layer right
Our first attempt used stdio transport (standard input/output), which worked locally. But ElevenLabs requires a public URL, it can't talk to a local process. We switched to SSE (Server-Sent Events) transport over HTTP, which gives us a proper endpoint. During development, we used ngrok to expose the local server publicly.
ElevenLabs can't handle complex tool parameters
This one took a while to figure out. ElevenLabs' LLM can't pass nested objects or arrays as tool parameters, it serialises them as plain strings instead of structured JSON. So any tool that expected a nested structure would break.
The fix: all tool parameters must be flat primitive types only. We moved any array or object construction server-side, so the agent passes simple values and the MCP server assembles the correct request bodies before calling Phorest.
Getting the API request shapes right
The Phorest docs aren't always obvious about exact request body structures. For example, the availability endpoint needs a nested clientServiceSelections array (not a flat list of service IDs), and the booking endpoint uses clientAppointmentSchedules with clientId appearing at multiple levels. Working through these took some trial and error.
What the Agent Can Do
| Capability | What it does |
|---|---|
| Client lookup & creation | Find an existing client by name, email, or phone, or register a new one |
| Service browsing | List all bookable services at a branch |
| Availability check | Find open appointment slots for a given service and date |
| Booking management | Create, confirm, or cancel an appointment |
| Staff & schedule lookup | Check which staff are available and their working hours |
| Appointments view | List upcoming appointments for a client |
| Branches, products, vouchers | Additional tools for broader salon operations |
In total, the MCP server exposes 19 tools that the voice agent can call during a conversation.
What We Learned
- Keep tool parameters flat. ElevenLabs (and likely other voice AI platforms) can't reliably handle nested structures. Build complexity server-side, keep the agent interface simple.
- Always check availability before booking. The availability endpoint returns the exact start time, end time, and staff ID, pass these straight into the booking call for reliable results.
- All Phorest API times are UTC. The salon's local timezone is separate from the API timezone. The agent needs to handle this when communicating appointment times.
- SSE transport works well for voice agents. Each agent session gets its own MCP server instance, keeping state isolated across concurrent conversations.
- ngrok is great for development, but plan for production. Free-tier URLs change on every restart, a proper deployment (Railway, Fly.io) is needed beyond prototyping.
What's Next
This MVP validates that conversational booking, via both voice and WhatsApp, works end-to-end with the Phorest API. Next steps to move beyond prototype:
- Add webhook support for real-time booking confirmations
- Explore how learnings from this prototype could feed back into FrontDesk AI