The Day The AI Heard Its First Voice

A solo founder's story of building ExecReps.ai

Mar 29, 2026

Lisbon, headphones on. The air thick with that specific damp winter smell, maybe. I was listening to one of our early beta users.

She was a product director from a Series B startup. Smart as hell. Her answer to our practice prompt, a mock board presentation on quarterly growth, was structurally flawless. Perfect, even. Clear thesis, supporting data points, strong close.

Our AI gave her a stellar score.

Then I played the audio.

She said “um” twenty-three times in ninety seconds. Her pace swung between racing through numbers at 210 words per minute and grinding to a near-halt when she hit anything she wasn’t sure about. Stretches where her voice got so quiet it almost disappeared.

None of that showed up in our assessment. For the first four months of building ExecReps, our AI couldn’t hear anyone. It could only read.

I keep coming back to that moment when someone asks about finding product-market fit. The thing that makes your product ‘real’ isn’t always the thing you build first. Sometimes it’s the thing you were accidentally ignoring.

Chief Product Officer: Managing Tension Between Founder-CEO Vision and Board's Market Expansion Pressure icon

The Text-Only Era

Let me back up. When we launched v0.1 in July 2025, the core loop was simple: you record yourself speaking, we transcribe it, GPT–4 evaluates the transcript, you get feedback. Record, Transcribe, Assess, Improve.

If you’ve read Nir Eyal’s Hook Model, you’ll recognise that structure. External trigger (the prompt), action (record yourself), variable reward (personalised AI feedback), investment (your practice history building over time). It worked.

GPT–4 is good at analysing structure, directness, and evidence. Users were coming back daily, practising pitch decks, preparing for all-hands, rehearsing difficult conversations. BJ Fogg’s B=MAP model would have given us decent marks: motivation was there (real presentations coming up), ability was easy (speak for two minutes, get instant feedback), and the prompt was the product itself.

I almost fell into a trap here. The product was working. Users were engaged. Growth was happening. Every instinct in my PM brain said ‘don’t touch it. Optimise what’s working.’

But something gnawed at me. We were building a communication coaching tool that only analysed what people said, not how they said it. We were a voice product that was functionally deaf.

This reminds me of a conversation I had with a guest on the Product Coalition podcast a while back, talking about the gap between perceived and actual performance. In Don Norman’s terms, we’d created a massive Gulf of Evaluation. Users would record themselves, get text-based feedback, and think “great, I’m improving.” They had no way to evaluate their actual delivery. That’s the part that mattered most for their career. The system was hiding their most critical signal from them.

The Origin Story Collision

I need to tell you why this bothered me more than it probably should have.

Years ago, I graduated from university in London with a design degree. Portfolio polished, creative director at a prestigious agency loved my work. Then the call came: “Jay, I have to be honest with you. Your work is great. Your attitude is great. But we can’t employ people who talk like you.”

My working-class Cockney accent. That was it.

Not what I said. How I sounded.

I was building a product whose entire origin story was about the injustice of being judged by your delivery, and the product itself could only judge content. The irony was not lost on me.

Every time I reviewed a user submission and saw a clean transcript score next to audio that told a completely different story. Rushed pace, vocal fry, filler words stacking up. I felt the gap widen. Kahneman calls this WYSIATI: ‘What You See Is All There Is.’ Our users were suffering from it. They’d look at their text-based score, see nothing flagged about delivery, and conclude their delivery was fine. The investor who later told us “I thought I sounded the same the whole way through”? That’s WYSIATI in action. If the system doesn’t show you a problem, your brain concludes no problem exists.

Going Acoustic

In September 2025, I started researching acoustic analysis providers. Building it ourselves lasted about a week. Training acoustic models from scratch is a rabbit hole that has swallowed entire startups. Build vs. buy was an easy call.

AssemblyAI became our answer: pace detection, filler word identification, pause analysis, and raw data we’d need for confidence and clarity metrics.

The hard part wasn’t the API calls. It was deciding what to do with the data. AssemblyAI gives you a firehose. Every millisecond of audio has extractable features. Which ones matter most? I’m still not sure we’ve found the perfect answer here, but we’re getting closer.

Nielsen’s heuristic of “recognition rather than recall” shaped our thinking here. We didn’t want to dump a wall of acoustic data on users. We needed to surface metrics they’d immediately recognise as relevant. Pace variation they could hear in their own head, filler words they’d felt themselves say. The data had to match their fuzzy memory of the recording, then sharpen it.

The 80/20 Decision

This is the product decision I think about most from that period. I still wrestle with it sometimes.

We now had two signals: content quality and delivery quality. My first instinct was 50/50. That’s what every public speaking book tells you. Mehrabian’s “7–38–55 rule” even suggests delivery matters ‘more’.

Mehrabian’s research is about emotional communication, not professional presentations. When a product director presents quarterly numbers, the content matters more than her pacing. If the analysis is wrong, no amount of vocal confidence saves it.

We went with 80% content, 20% delivery. It felt conservative. Part of me wanted to weight delivery higher because that was our differentiator.

A product team that’s worked with the Kano Model would recognise the trap. Acoustic analysis was our ‘delighter,’ the feature that surprises and differentiates. Content quality was our ‘must-be,’ the foundational feature users expected to just work. Kano’s research is clear: over-invest in delighters at the expense of must-be quality and the whole product collapses.

The 80/20 split evolved to 60/40 by v1.3 as our delivery analysis matured. Starting conservative was right. You can always increase the weight of a signal as it improves. Decreasing it after users have relied on it is harder. That’s Kahneman and Tversky’s loss aversion at work. People feel losses roughly twice as intensely as equivalent gains. Telling a user “your delivery score now counts less” feels like taking something away. I wonder if we could have communicated that better, in retrospect.

The First Real Analysis

I remember the exact day we ran the first complete acoustic analysis. Early November 2025, right before v1.0 went live.

A beta user doing a practice investor pitch. Transcript score was strong: clear problem statement, solid market sizing. Under the old system, encouraging feedback, move on.

The acoustic analysis told a different story. His pace averaged 178 WPM, fine, but variance was extreme. Racing through market opportunity at 220+ WPM (nerves) and crawling during financials (uncertainty). He used “basically” seventeen times. Two pauses over four seconds. Not power pauses, but the kind where someone loses their thread.

When we showed him the combined assessment: here’s what you said (strong), here’s how you said it (needs work), here’s exactly where. He sent us a message I still have saved:

“I’ve given this pitch eleven times. This is the first time anyone told me I speed up when I’m excited and slow down when I’m unsure. I thought I sounded the same the whole way through.”

That message illuminated something about what Daniel Pink calls Mastery in his Drive framework. People can’t improve what they can’t see. The text-only version gave users mastery over ‘content.’ Delivery mastery was invisible. They had no feedback signal, no progress to track, no way to know if they were getting better or just more comfortable with bad habits. Acoustic analysis opened an entire mastery dimension that had been locked.

Product Strategy: Defining 3-Year Vision for Email Client in Era of Slack and Async Communication icon

The Product Lesson: Finding Your 10x Moment

If you’re building a product, especially an AI product, there’s a concept I keep coming back to: the 10x moment. It’s the point where your product stops being “useful” and starts being ‘I can’t go back to not having this.’

Text-based feedback was useful. Valuable. Users came back.

Acoustic analysis made ExecReps ‘real.’ It was the difference between a product that told you ‘what to say differently’ and one that told you ‘how you sound when you say it.’

In Hook Model terms, acoustic analysis transformed our variable reward. Text-only feedback was becoming predictable. After a few sessions, users could roughly guess their content score. Eyal is explicit: predictable rewards lose their pull. Acoustic analysis reintroduced variability. Every recording surfaced something unexpected. ‘I didn’t know I did that’ is the most powerful variable reward a coaching product can deliver.

I’ve learned this about 10x moments: they’re usually hiding in the thing your product should do but doesn’t yet. They feel obvious in retrospect. ‘Of course a voice coaching tool should analyse the voice.’ When you’re in the weeds of building, though, the temptation to optimise what exists instead of building what’s missing is enormous.

The framework I use now is simple: “What is the most obvious thing our product should do that it currently can’t?” Not the clever thing. The ‘obvious’ thing. The thing a non-technical friend would ask about at dinner. ‘Wait, your voice coaching app doesn’t listen to people’s voices?’

That’s where your 10x moment lives.

What Acoustic Analysis Taught Me About the Mission

When communication coaching was something only Fortune 500 executives could access at $500–$1,000 an hour, those coaches were listening. They caught the pace changes, the filler words, the confidence drops. Human ears doing acoustic analysis with years of experience.

Every product that had tried to democratise this before us stopped at text. Grammar checkers, script analysers, AI writing assistants: all content, no delivery. None could hear you.

By going acoustic, we closed the gap between what a $1,000/hour coach provides and what everyone else has access to. That’s the mission we started with at Product Coalition, in a way: making product management knowledge accessible. This feels like a natural extension of that.

Executive presence isn’t about talking points. Anyone with ChatGPT can generate those. It’s pace and pause and conviction and clarity. It’s what got me rejected from that design agency. Deci and Ryan’s Self-Determination Theory says people need competence feedback to stay intrinsically motivated. Our text-only product gave users an incomplete competence signal, half the picture. Acoustic analysis completed it. When people feel competent, they don’t need external pressure to practise. They come back because they ‘want’ to.

Finance Manager: Addressing Direct Report's Surprise at Reporting to Someone Returning from 2-Year Break icon

What Came Next

Shipping v1.0 with acoustic analysis in November 2025 was just the beginning. Once we could hear, we needed to understand. The Voice Performance Score in v1.2 broke delivery into Command, Eloquence, Engagement, and Consistency. Then v1.3’s Executive Performance Score unified content and delivery into a single metric tracking progress over time.

Each was its own battle. All downstream of the same decision: stop being deaf, start listening.

Four months of a product that could only read. One afternoon hearing a user who didn’t know she said “um” forty-three times. And a bet that the obvious missing piece was worth the engineering investment.

Because the metrics were fine. The product was still incomplete.

That dissonance, between what the data says and what your gut knows. If you’re building something that matters, learn to listen to it. Pun fully intended.

Discussion about this post

Ready for more?