Wireless speaker/microphone for robot. Suggestions?

Thread Starter

WBahn

Joined Mar 31, 2012
30,077
I need to give a robot a voice and some ears.

Basically, I need to mount a wireless speaker and a microphone on him that can communicate with a Linux PC located no further than perhaps fifteen feet. I need to capture the raw audio data from the mic and send raw audio data to the speaker. No harsh performance specs involved, just looking for voice and don't need high fidelity. The PC will work with the data either via Java or Python.

So what options should I be looking for? If I get a generic WiFi speaker can I get send it data directly, or am I likely to have to use some app that comes with it and that has no useful API since it's meant for consumer entertainment?

I'm thinking that the ideal situation (but open to suggestions) would be if the speaker and microphone used UDP datagrams that I could just generate or capture. Does anyone know of any speaker or mic for which this is possible?

Would Bluetooth be a viable route to go, or am I even more likely to end up not being able to interface to it at the lower level that I need to?

If anyone needs more information than I've given, just say so and I'll try to answer any questions you have.
 

Alec_t

Joined Sep 17, 2013
14,333
There are plenty of Bluetooth-enabled speakers/headphones and audio-transmitters advertised, intended for such things as sending line-level audio signals from a TV to headphones. Those transmitters should be able to get a line-level signal from a PC audio card.
 

nsaspook

Joined Aug 27, 2009
13,312
Last edited:
At first, one might think that Bluetooth is the way to go, but I am not sure that it is the cheapest or easiest way to go, especially if your audio data requirements are not too intense.

I think that an inexpensive, highly controllable and easy way is to consider is to use the ESP8266. There are lots of "audio streaming" applications that folks have written up.

search esp8266 sound streaming
search esp8266 audio streaming

In your case, I think you basically want two baby monitors or two walkie talkies.

search esp8266 baby monitor
search esp8266 walkie talkie
 

wayneh

Joined Sep 9, 2010
17,498
People around here are probably getting tired of me suggesting this, but I love my Wyzecam. It supports 2-way audio over wifi and is only ~$20. I guess it's not technically two-way audio because it's push-to-talk at the client end and you can't hear the camera when you're talking. But anyway it does a lot for $20.
 

cmartinez

Joined Jan 17, 2007
8,257
People around here are probably getting tired of me suggesting this, but I love my Wyzecam. It supports 2-way audio over wifi and is only ~$20. I guess it's not technically two-way audio because it's push-to-talk at the client end and you can't hear the camera when you're talking. But anyway it does a lot for $20.
I was going to suggest the exact same thing.
 

Thread Starter

WBahn

Joined Mar 31, 2012
30,077
People around here are probably getting tired of me suggesting this, but I love my Wyzecam. It supports 2-way audio over wifi and is only ~$20. I guess it's not technically two-way audio because it's push-to-talk at the client end and you can't hear the camera when you're talking. But anyway it does a lot for $20.
I'll look at that, but the PPT might be a deal killer since the idea is for the robot to have as normal a conversation with the person he is interacting with as possible.
 

Thread Starter

WBahn

Joined Mar 31, 2012
30,077
I wonder if the cheapest solution might not be to simply modify a pair of cheap Walkie-Talkies...with these you also get a cool flashlight :)
Do any of those Walkie-Talkies have computer (most likely USB) interfaces? Also, do any of them NOT have push-to-talk?

Remember that the whole point is to get audio from the microphone into the remote PC for processing and to get audio generated on the remote PC played back by the speakers.
 

Thread Starter

WBahn

Joined Mar 31, 2012
30,077
I have some IP cameras that support duplex audio. But as they are badge engineered, the brand is really hard to work out. It could be you may be able to use the camera motion controls to drive your robot instead of the camera. Or just the robot head.
Something like this, at least, to look at.
https://www.ebay.com.au/itm/HD-720P...hash=item4b260ac809:m:maJcMwE3KpZyUMmui2eMeCQ
Driving the robot isn't an issue -- that is done via ROS (Robot Operating System).

Baby monitor systems might be worth looking into.

I've also discovered a term that is helping: "podcast" microphone. Looks like some possibilities there.
 
Do any of those Walkie-Talkies have computer (most likely USB) interfaces? Also, do any of them NOT have push-to-talk?

Remember that the whole point is to get audio from the microphone into the remote PC for processing and to get audio generated on the remote PC played back by the speakers.
Well I was thinking that you would kludge a transistor switch on one to hold it in talk - that serves as the robot ears. A second is the receiving unit always listening (into the PC). A second set would be needed to talk into and have the voice broadcast on the robot. But, don't forget that cool flashlight!
 

Thread Starter

WBahn

Joined Mar 31, 2012
30,077
Well I was thinking that you would kludge a transistor switch on one to hold it in talk - that serves as the robot ears. A second is the receiving unit always listening (into the PC). A second set would be needed to talk into and have the voice broadcast on the robot. But, don't forget that cool flashlight!
Who's talking into this second set?

This is for an autonomous robot interacting with a human.

The human says something. That is captured by the robot's microphone and sent to the PC, which processes the audio to determine what was said. This information is then acted upon by the robot's programming to do something. For instance, the person says, "Hand me the screwdriver," and the robot picks up the screwdriver and places it in the person's hand. Or the robot recognizes that there are multiple screwdrivers and responds, "Which screwdriver do you want?", which is sent from the robot's programming to a program on the PC that then generates the audio signal and plays it out the robot's speaker.
 
Who's talking into this second set?

This is for an autonomous robot interacting with a human.

The human says something. That is captured by the robot's microphone and sent to the PC, which processes the audio to determine what was said. This information is then acted upon by the robot's programming to do something. For instance, the person says, "Hand me the screwdriver," and the robot picks up the screwdriver and places it in the person's hand. Or the robot recognizes that there are multiple screwdrivers and responds, "Which screwdriver do you want?", which is sent from the robot's programming to a program on the PC that then generates the audio signal and plays it out the robot's speaker.
Oh, I got ya...I thought you were getting an early start on freaking out kids at Halloween.

"Basically, I need to mount a wireless speaker and a microphone on him that can communicate with a Linux PC located no further than perhaps fifteen feet. I need to capture the raw audio data from the mic and send raw audio data to the speaker. No harsh performance specs involved, just looking for voice and don't need high fidelity."

WT1 is used as the wireless microphone on the robot. It captures audio data and sends it to the PC for processing. The PC has WT2 to get the speech data (I am assuming that the PC is what is 15 ft. away).

WT3 is used as the wireless speaker on the robot. The PC uses WT4 to send audio data to WT3 if needed ("Which screwdriver?")
 

Thread Starter

WBahn

Joined Mar 31, 2012
30,077
Oh, I got ya...I thought you were getting an early start on freaking out kids at Halloween.

"Basically, I need to mount a wireless speaker and a microphone on him that can communicate with a Linux PC located no further than perhaps fifteen feet. I need to capture the raw audio data from the mic and send raw audio data to the speaker. No harsh performance specs involved, just looking for voice and don't need high fidelity."

WT1 is used as the wireless microphone on the robot. It captures audio data and sends it to the PC for processing. The PC has WT2 to get the speech data (I am assuming that the PC is what is 15 ft. away).

WT3 is used as the wireless speaker on the robot. The PC uses WT4 to send audio data to WT3 if needed ("Which screwdriver?")
So WT1 is mounted on the robot with the PPT button permanently held down. The microphone has to be suitable for capturing human speech from a distance of up to 15 ft (it's the human that is within about 15 feet of the robot). Most Walkie Talkies are designed for normal speech an inch or two away from the microphone. But even if I find one that is suitable in that regard, WT2 that is located near the PC wants to play the speech out it's speaker. So how do I get that into the PC? Using a local mic is not an option because we can't have the sound being audible. So now I have to hack the WT to pull out the signal -- although I would imagine that many WTs have external speaker or headphone jacks and that might be compatible with the line-in on the computer's sound card.

But what about going the other way? Do some WT have an external input that I could feed from the audio output of the sound card? I guess that's probably fairly likely given the widespread use of headsets for so many things these days.

But it's still an awful kludge.

Since asking the question I think I've decided on an even simpler approach that has some additional advantages.

I'm going to fashion a mount on the robot for one PC and connect both that PC and the robot to a Ethernet switch. Since that PC will be physically located with the robot, there is no problem running wired audio to the microphone(s) and speaker(s). This will give the robot true self-contained autonomy and at the same time allow me to use static IP addresses and have an isolated network for security purposes.

But the robot also has to interact with other systems, so the switch will actually be part of a wireless router configured as a bridge to another router located with an optional second PC. This PC can then also sit on whatever other network anything else is on and act as the intermediary between that network and the robot.
 
[/]
But it's still an awful kludge.
[/]
As I learn more about the specific intended setup is, yeah likely so.

It seems like an interesting project. How are you going to do the speech recognition?

I have some experiments with speech recognition on a not-so-back burner and was taken by how difficult speaker-independent recognition is...unless you want these "universal" Google-type cloud-based stuff....https://cloud.google.com/speech-to-text/ ...and yes, it is impressive, but does not belong to me the way a program I write does - at least that was my feeling after looking into it a bit

I did settle on this one https://www.mikroe.com/speakup-2-click to go with this board http://www.microchip.com/Developmenttools/ProductDetails.aspx?PartNO=DM320104 I will see how far I get...
 

Thread Starter

WBahn

Joined Mar 31, 2012
30,077
As I learn more about the specific intended setup is, yeah likely so.

It seems like an interesting project. How are you going to do the speech recognition?

I have some experiments with speech recognition on a not-so-back burner and was taken by how difficult speaker-independent recognition is...unless you want these "universal" Google-type cloud-based stuff....https://cloud.google.com/speech-to-text/ ...and yes, it is impressive, but does not belong to me the way a program I write does - at least that was my feeling after looking into it a bit

I did settle on this one https://www.mikroe.com/speakup-2-click to go with this board http://www.microchip.com/Developmenttools/ProductDetails.aspx?PartNO=DM320104 I will see how far I get...
The system that is currently being used is TactSpeak, which is a Windows-only application and requires a pretty hefty licensing fee in order to modify the grammar. It does both Speech-to-Text and Test-to-Speech and does a good enough job for our needs (at least right now). But since it has to run on a Windows machine, we have it on another PC listening to its microphone and then looking at whether it finds a match tot he grammar and, if so, sends the corresponding string over the network which the ROS machine can see and then act upon. The reverse happens in the other direction. The application on the ROS machine sends network traffic with the sentence that needs to be spoken and the TactSpeak machine sees it and then synthesizes the speech and plays it out it's speakers.

Because we don't want to have the additional machine and because we don't want to pay the licensing fee to modify the grammar, we are moving to Sphinx, which is an open-source Linux project. I haven't started playing with it, yet.
 
Top