Advice on a project using microphone, speaker, HTTP APIs

Thread Starter


Joined Apr 19, 2021

I'm looking for some advice on a project I'm doing and more specifically the best microcontroller to approach this. I've tried an approach already but reached a number of hurdles and need to return to square one.

The project is a physical desktop intercom which takes only a power supply and acts as an intercom with Chat GPT.

The program will record audio, send the audio via HTTP POST API to Open AI to transcribe speech to text, then it will use the API to get an answer to the question, then use an API to convert text to speech before playing it back on the speaker.

To date, I have a breadboard with an ESP32, SD card module, microphone, amp and speaker. The challenge I've reached is there is not sufficient memory on the ESP32 to handle sending an audio file via HTTP POST. I have four different methods with different means of sending a multipart form data including text and the audio file to Open AI, but all have failed, the most successful of which have failed due to insufficient memory. A short 5 second WAV file is 111KB, the max memory on ESP32 is 512KB but some of this is used. The HTTP request needs to write the audio file to memory from an SD card then use memcopy to create a correctly formatted payload to send to the API. 111KB * 2 is greater than the ESP32 can handle so it fails. This obviously would be even more of an issue with longer audio.

A few options I can see;

  1. Find a way to expand the memory of the ESP32 maybe using SPI RAM
  2. Move to an entirely different microcontroller

Briefly looking at microcontrollers, raspberry pi could work, but it has no ADC or DAC for the microphone and speaker which presents more problems.

Any suggestions appreciated


Joined Jan 27, 2019
Welcome to AAC.

First, take a look at the Lyra TD Smart Speaker platform from Espressif. It is designed for the job, and is based on the ESP32. [getting started documentation here]

Second, you may have to stream the audio to a server, then forward it to ChatGPT.

As far as RPi goes, there are at least two options:

  • USB devices with their own ADC/DAC
  • An RPi sound board “hat” which can take analog input, and interfaces via the 40-pin header, or a USB dongle version.

The RPi or possibly even the RPi Zero W (2) would be a good choice. One advantage is a full Linux installation making development, perhaps, somewhat easier.


Joined Oct 3, 2010
Did you try asking chatGPT?
Maybe the question was longer than 5s.

You have reached (and in in cases >5s, apparently exceeded) the boundary of complexity between where a microcontroller is suitable, and where a computer is suitable.

I would use Raspberry Pi or other SBC for this.

Thread Starter


Joined Apr 19, 2021
Thanks all, I think I reached the limit of ESP32, so moving the project to Raspberry Pi. Hopefully will be straight forward!