Back to All
Ai/ml
Blog

How to Create Your Own AI Siri Using Home Assistant and OpenClaw for Under $100 | Mission

Listen
How to Create Your Own AI Siri Using Home Assistant and OpenClaw for Under $100 | Mission
5:23

 

Way back in 2010, Apple purchased a little startup that produced a revolutionary virtual assistant called Siri. At the time, Siri’s voice recognition and capabilities showed high potential, but in the 16 years since its announcement, Siri has improved at a glacial pace. With the advent of ChatGPT, Claude, and other LLM-powered AI assistants, Siri’s lack of consistency and capabilities are amplified, and the once promising voice assistant has become a sad punchline.

In 2024, Apple said that they were developing a new version of Siri backed by LLMs . Since the announcement, AI Siri has seen multiple delays, the most recent of which pushed AI Siri into “late 2026.”

In spite of Siri lagging far behind its competitors in the market, I have HomePod devices scattered around my smart homes. Their utility is limited to turning lights on and off, setting scenes, timers, and alarms, and playing music. The devices themselves sound great, but Siri really drags them down.

This is where the infomercial usually shows a frustrated person getting bombarded by a cascade of tupperware falling from a cabinet. Y’all – “there’s got to be a better way!

DIY AI Siri

A few weeks ago, I told you about my AI assistant Demerzel — an always-on AI that I text through iMessage, that manages my calendar, checks my email, controls my smart home, and even helps me write code. Demerzel is growing more capable by the day, as the OpenClaw open source project that powers her continues to evolve at a staggering pace. Her biggest shortcoming for day to day use? I can’t talk to her with my voice.

Until today.

The Hardware

Back in 2023, Nabu Casa released the Home Assistant Voice Preview Edition (Voice PE) — a small, purpose-built device with a microphone array, speaker, and an ESP32-S3 chip. It’s designed to be a local, private voice assistant that integrates with your smart home. At $59, it’s significantly cheaper than an Echo or HomePod, and — critically — it processes your wake word on the device rather than streaming everything to the cloud.

Out of the box, it responds to “Hey Jarvis” or “Okay Nabu,” and offers a similar set of capabilities as HomePods and Siri. Last week I decided to explore connecting Demerzel to my Voice PE, unlocking a whole new world of voice capabilities.

Training a “Wake Word” Model

This is where it gets fun. The Voice PE uses a framework called microWakeWord — tiny neural network models that run on microcontrollers. To train a custom model, you need thousands of audio samples of someone saying your wake word, plus thousands of negative samples of other speech that isn’t your wake word.

I used a tool called microWakeWord Trainer for Apple Silicon that runs a local web app on your Mac. You record yourself saying the wake word a bunch of times, then it generates around 50,000 synthetic samples using text-to-speech with variations in pitch, speed, and accent. The whole training process ran locally on my M4 Mac Mini — no cloud GPUs, no Colab notebooks, no API costs. Just my Mac churning away for a couple hours.

The result? A .tflite model file — about 50KB — that can detect “Hey Demerzel” with remarkable accuracy on a $5 microcontroller.

Wiring It All Together

Getting the trained model onto the Voice PE required flashing custom firmware via ESPHome. That sounds intimidating, but Home Assistant has a built-in ESPHome add-on that handles compilation and over-the-air updates. I wrote a custom YAML configuration referencing my wake word model, compiled it, and flashed the device — all from my browser.

The voice pipeline goes like this: the Voice PE detects “Hey Demerzel” locally, then streams my speech to Home Assistant for processing. From there, I built a custom WebSocket proxy that connects Home Assistant’s voice pipeline to Demerzel’s brain — the same AI that handles my text messages. The response streams back, gets converted to speech, and plays through the device’s speaker.

The entire round trip — wake word detection to spoken response — takes about 2-3 seconds. And because the wake word runs on-device, the microphone isn’t streaming audio anywhere until it hears the magic words.

Why This Matters Beyond My Living Room

Here’s what excites me about this: every piece of this stack is open source. The wake word framework, the training tools, the voice hardware, the home automation platform, the AI assistant framework. A year ago, building a custom voice AI assistant required a team of engineers and a cloud infrastructure budget. Today, a motivated tinkerer can do it on a Saturday afternoon for under $100.

We’re entering an era where the interface to AI is becoming as customizable as the AI itself. Your assistant doesn’t have to sound like Alexa or respond to “Hey Siri.” It can have whatever name you want, whatever personality you define, and whatever capabilities you wire up. That’s a fundamental shift from the walled-garden voice assistants we’ve lived with for the past decade.

The big tech companies built voice assistants as products. The open source community is building voice assistants as platforms. And platforms always win in the long run.

Jonathan LaCour avatar

2 minutes read