TL;DR
We built an automated pipeline that exports 2,099 Kaia sales call transcripts from the Outreach API, uses Claude AI (Haiku) to score each transcript against 15 product features, uploads the scored data to a Notion database, and generates an interactive HTML dashboard showing how client feature requests trend over time. The goal: validate whether shipping a feature actually reduces the number of times clients ask for it on demo calls.
The Problem
Phorest's product team ships features on a regular cadence, but had no quantitative way to measure whether a release actually reduced client demand for that feature. The only signal was anecdotal, reps saying "people ask about X a lot" in Slack.
Meanwhile, every sales demo is recorded via Outreach Kaia, producing full transcripts with timestamps and speaker labels. Thousands of these were sitting untapped.
- 2,099 qualifying recordings across 17 months (Nov 2024 – Mar 2026)
- Each transcript is thousands of lines long, impossible to read manually
- We needed to distinguish a rep mentioning a feature from a client requesting it
- Results needed to be trackable over time with release-date markers
- The process needed to be repeatable as new calls come in
The Solution: Four-Stage Pipeline
Stage 1, Outreach API Export
Connects via OAuth2 refresh-token flow, queries /api/v2/kaiaRecordings, and exports all qualifying recordings to a CSV. Filters: startTime Nov 2024–Apr 2026, duration ≥30 min, provider = Zoom or Google Meet.
Output: kaia-transcripts.csv, 224 MB, 2,099 rows. export-transcripts.mjs handles pagination, rate limiting, and streaming CSV output with embedded newlines.
Stage 2, AI Scoring with Claude API
For each of the 2,099 transcripts, sends the full text to Claude Haiku with a structured scoring prompt. The AI returns a 0/1 score for each of 15 tracked features, a call overview, and specific client quotes for any feature scored 1.
Score = 1 only if a client genuinely requests or expresses need for the feature. If only the sales rep mentions it during a demo walkthrough, score = 0.
This distinction is essential, we're measuring unmet client demand, not rep pitch frequency.
The 15 features tracked
Send Payment Links · Memberships 2.0 · Add-ons & Zero-minute services · Send Consultation Form on booking · OB Strict Gap Time · Custom Break Blocks · Print Travelers for stylists · Gift Voucher Commissions · Business Level Commission · Log of notifications in Go · Finishing time for a single service · Additional Gender Settings · Go, restrict 'my numbers' history · Aveda+ Rewards Integration · Sell memberships online
- Model: Claude Haiku (200K context, ~$0.001 per transcript)
- Concurrency: 3 parallel API calls with exponential backoff
- Checkpoint system: after each successful record the ID is saved to
processed-ids.json; re-running automatically skips completed records, no duplicates, no lost work
Stage 3, Notion Database Population
For each scored transcript, creates a Notion page with the call title, date, recording ID, 15 number columns (0/1) per feature, and a 2–4 sentence call overview. For any feature scored 1, the exact client quote and context are appended.
This works as both a filterable dashboard (sort/group by feature, filter by month) and a drill-down tool, click any row to read what the client actually said.
Stage 4, Interactive HTML Dashboard
- Summary cards: total calls, features tracked, date range, overall mention rate
- Per-feature trend charts, % of calls per month, with vertical dashed release-date markers
- Monthly breakdown table, 15 features × 17 months, colour-coded (green = low, red = high)
- Toggle controls, % vs raw counts, monthly vs quarterly, trend % visibility
Trend calculation: linear regression across all months (not just first vs last). The slope is expressed as a % change relative to the mean, giving an accurate "is demand actually rising or falling" signal that accounts for month-to-month noise.
Results
| Before | After |
|---|---|
| No data on what clients actually ask for on calls | 2,099 transcripts scored across 15 features with client quotes |
| Anecdotal "reps say people want X" | Quantified: exact % of calls per month per feature |
| No way to measure feature release impact | Before/after trend lines with release-date markers |
| Would take weeks to read transcripts manually | AI scored all 2,099 in ~4.5 hours for ~$2 |
| One-off analysis | Repeatable pipeline, re-run anytime with new data |
What We Learned
- Client-requested vs rep-mentioned is the key distinction. Without it, every demo call would score 1 on most features since reps walk through the product. The AI is surprisingly good at telling them apart.
- Checkpoint/resume is essential for large batch jobs. The Claude API rate-limited us frequently. Without the checkpoint file, any interruption would mean re-processing (and duplicating) hundreds of records.
- Haiku is good enough for structured scoring. We tested Sonnet (10× the cost), scores were nearly identical for this use case. Haiku's 200K window handles even the longest transcripts in a single call.
- Linear regression beats first-vs-last for trend calculation. Initial approach compared first 3 months to last 3, misleading for features that spiked mid-period.
- Partial months skew data. Worth noting when interpreting the latest data points.
- The dashboard is a conversation starter, not the final answer. The real value is clicking into individual Notion pages and reading the actual client quotes.
Tech Stack
- Outreach API, OAuth2 + REST for Kaia recording export
- Node.js,
export-transcripts.mjs,score-and-upload.mjs - Claude Haiku API, transcript scoring (~$0.001/transcript)
- Notion API, database population via MCP integration
- Chart.js, interactive dashboard charts
- GitHub Pages, dashboard hosting