AI Phone
Today the most common hardware interface to AI is our phone. Other interfaces have not yet broken through because it is hard to compete with the utility and ease of access that our phones provide. These thoughts led me to look again at the yellowing intecom phone next to the kitchen that once was used to open the front door to the building, but no longer works (everybody calls now anyway).
I wanted to infuse new life into this appliance while building a new interaction method to LLMs in a way that would be useful and unobtrusive.
What if you could pick up the phone and immediately speak to an LLM, get the information you need, then hang up.
Hardware
- ESP32-WROOM-32D, the “brain” with WiFi and I2S interface for audio
- MAX98357, DAC chip that drives the speaker
- INMP441, MEMS microphone
Software
- On the ESP32 I run a C++ program that captures and streams audio packets
- A gateway service on my home server acts as middleware, written in Node.js, managing the WebSocket connection to the OpenAI LLM API
Further work
The usefulness of having a wall mounted AI phone is undeniable, and it is kind of surprising that there are no real commercial products in this form factor. In a way it is similar to having a smart speaker in the kitchen, but it feels more private and intentional.
There is obviously much more functionality to add. The number dial pad is just asking to be used. Think wiring MCP tools to different buttons, or more complex n8n automations. There is a lot more to explore here.