Advice on a project using microphone, speaker, HTTP APIs

teenflon5 · Feb 6, 2024

Hello,

I'm looking for some advice on a project I'm doing and more specifically the best microcontroller to approach this. I've tried an approach already but reached a number of hurdles and need to return to square one.

The project is a physical desktop intercom which takes only a power supply and acts as an intercom with Chat GPT.

The program will record audio, send the audio via HTTP POST API to Open AI to transcribe speech to text, then it will use the API to get an answer to the question, then use an API to convert text to speech before playing it back on the speaker.

To date, I have a breadboard with an ESP32, SD card module, microphone, amp and speaker. The challenge I've reached is there is not sufficient memory on the ESP32 to handle sending an audio file via HTTP POST. I have four different methods with different means of sending a multipart form data including text and the audio file to Open AI, but all have failed, the most successful of which have failed due to insufficient memory. A short 5 second WAV file is 111KB, the max memory on ESP32 is 512KB but some of this is used. The HTTP request needs to write the audio file to memory from an SD card then use memcopy to create a correctly formatted payload to send to the API. 111KB * 2 is greater than the ESP32 can handle so it fails. This obviously would be even more of an issue with longer audio.

A few options I can see;

Find a way to expand the memory of the ESP32 maybe using SPI RAM
Move to an entirely different microcontroller

Briefly looking at microcontrollers, raspberry pi could work, but it has no ADC or DAC for the microphone and speaker which presents more problems.

Any suggestions appreciated

Ya’akov · Feb 6, 2024

Welcome to AAC.

First, take a look at the Lyra TD Smart Speaker platform from Espressif. It is designed for the job, and is based on the ESP32. [getting started documentation here]

Second, you may have to stream the audio to a server, then forward it to ChatGPT.

As far as RPi goes, there are at least two options:

USB devices with their own ADC/DAC
An RPi sound board “hat” which can take analog input, and interfaces via the 40-pin header, or a USB dongle version.

The RPi or possibly even the RPi Zero W (2) would be a good choice. One advantage is a full Linux installation making development, perhaps, somewhat easier.

strantor · Feb 6, 2024

Did you try asking chatGPT?
Maybe the question was longer than 5s.
</s>

You have reached (and in in cases >5s, apparently exceeded) the boundary of complexity between where a microcontroller is suitable, and where a computer is suitable.

I would use Raspberry Pi or other SBC for this.

teenflon5 · Feb 18, 2024

Thanks all, I think I reached the limit of ESP32, so moving the project to Raspberry Pi. Hopefully will be straight forward!

Ya’akov · Feb 18, 2024

teenflon5 said:
Thanks all, I think I reached the limit of ESP32, so moving the project to Raspberry Pi. Hopefully will be straight forward!

Did you see my answer in #2 ?

Thread starter	Similar threads	Forum	Replies	Date
A	Need advice on documenting my Raspberry Pi power failover project (and maybe some help designing it)	General Electronics Chat	4	Dec 14, 2025
	Could Someone Give Me Advice on Designing a Custom PCB for a Home Automation Project?	PCB Layout , EDA & Simulations	19	Aug 30, 2024
	Advice is needed on FPGA dev board choice for the project	FPGAs (Field Programmable Gate Array)	6	May 2, 2024
D	Choosing Capacitors for Noise Filtering in Project: Advice Needed	General Electronics Chat	19	Mar 30, 2024
	How to make my delay relay circuit latch?	General Electronics Chat	2	Jan 8, 2024

Advice on a project using microphone, speaker, HTTP APIs

Join our Engineering Community! Sign-in with:

Advice on a project using microphone, speaker, HTTP APIs

teenflon5

Ya’akov

strantor

teenflon5

Ya’akov

You May Also Like

Paragraf Unwraps Graphene-Based FET Made at New Graphene Foundry

Broadcom Targets Mass-Market Broadband With 10G PON and Wi-Fi 8 SoCs

Rewiring the Spine: The Tech Restoring Movement After Spinal Injury

Microchip Releases Data Center Retimers for High‑Bandwidth Architectures