Hello,
I'm looking for some advice on a project I'm doing and more specifically the best microcontroller to approach this. I've tried an approach already but reached a number of hurdles and need to return to square one.
The project is a physical desktop intercom which takes only a power supply and acts as an intercom with Chat GPT.
The program will record audio, send the audio via HTTP POST API to Open AI to transcribe speech to text, then it will use the API to get an answer to the question, then use an API to convert text to speech before playing it back on the speaker.
To date, I have a breadboard with an ESP32, SD card module, microphone, amp and speaker. The challenge I've reached is there is not sufficient memory on the ESP32 to handle sending an audio file via HTTP POST. I have four different methods with different means of sending a multipart form data including text and the audio file to Open AI, but all have failed, the most successful of which have failed due to insufficient memory. A short 5 second WAV file is 111KB, the max memory on ESP32 is 512KB but some of this is used. The HTTP request needs to write the audio file to memory from an SD card then use memcopy to create a correctly formatted payload to send to the API. 111KB * 2 is greater than the ESP32 can handle so it fails. This obviously would be even more of an issue with longer audio.
A few options I can see;
Briefly looking at microcontrollers, raspberry pi could work, but it has no ADC or DAC for the microphone and speaker which presents more problems.
Any suggestions appreciated
I'm looking for some advice on a project I'm doing and more specifically the best microcontroller to approach this. I've tried an approach already but reached a number of hurdles and need to return to square one.
The project is a physical desktop intercom which takes only a power supply and acts as an intercom with Chat GPT.
The program will record audio, send the audio via HTTP POST API to Open AI to transcribe speech to text, then it will use the API to get an answer to the question, then use an API to convert text to speech before playing it back on the speaker.
To date, I have a breadboard with an ESP32, SD card module, microphone, amp and speaker. The challenge I've reached is there is not sufficient memory on the ESP32 to handle sending an audio file via HTTP POST. I have four different methods with different means of sending a multipart form data including text and the audio file to Open AI, but all have failed, the most successful of which have failed due to insufficient memory. A short 5 second WAV file is 111KB, the max memory on ESP32 is 512KB but some of this is used. The HTTP request needs to write the audio file to memory from an SD card then use memcopy to create a correctly formatted payload to send to the API. 111KB * 2 is greater than the ESP32 can handle so it fails. This obviously would be even more of an issue with longer audio.
A few options I can see;
- Find a way to expand the memory of the ESP32 maybe using SPI RAM
- Move to an entirely different microcontroller
Briefly looking at microcontrollers, raspberry pi could work, but it has no ADC or DAC for the microphone and speaker which presents more problems.
Any suggestions appreciated