Speech Recording and Playback


Joined Jan 29, 2010
hi JHT,
Look at the ISD17xx series of Audio IC,s
I have been 'playing' with a ISD1760 [60 seconds] with SPI control and Nano
Use an external 3Watt audio amp and a decent speaker, acceptable quality.
EG57_ 156.png
Last edited:


Joined Oct 2, 2009
I built a telephone caller ID enunciator using a Winbond ISD4002 circuit (or something similar). It would announce the name of the caller that I had recorded and placed in a personal caller list. If the number was not in the list it would announce the phone number. This was many years ago and I don't know what chips are available now.


Joined Jan 27, 2019
The modules I have found to be quite capable (not just a chip, it's an entire module) are the JR6001 modules. They are incredibly cheap considering their functionality. They have on-board flash, accept MP3 (and IIRC, WAV too) by appearing as a USB mass storage device when the on-board microUSB is connected to a computer.

They have serial control, and a command set that includes commands specifically designed to do what you want. They also have a (small) audio amp on board that can drive a speaker, though I don't know how well it would work for you. The demo files that come loaded on them even have numbers recorded for an example like what you want to do.

i've attached the datasheet for the module. If you want to try it I have worked out how to use it should you need help.


Audioguru again

Joined Oct 21, 2019
I like to hear wideband music and speech. An AM radio or old telephone that cuts-off at 3kHz eliminates most consonants in speech causing "What? What did you say?" over and over.
I think a 1kHz cutoff will play only vowels, not speech and sound like a throat mic.

Audioguru again

Joined Oct 21, 2019
The required frequency response for American speech is shown in many videos to include many important consonant sounds between 2kHz and 4kHz for good intelligibility.
"Ee el ee el a uh ee or" for poor levels high frequencies. "She sells seashells at the seashore" for normal levels of high frequencies.
Last edited: