Voice Recognition using Matlab

CoolZone · Nov 3, 2007

hello people,

I also have a project at the university regarding voice recognition in MATLAB.
Can anyone help me with a project or with a program that does voice recognition in MATLAB?(preferably an explained one)
I'm using prerecorded wav files,and the speaker is in front of the microphone,and the program must detect if the speaker has the same voice like the one in the samples,by saying the same words like in the samples.

Dave · Nov 3, 2007

We can help. Could you firstly give us an impression of what you have done, if anything. Perhaps you could also give us a brief overview of how you intend to go about it starting from reading the wav-file. It is generally a 6-7 part process - I'm not testing you, there is no right or wrong answer to this, I'm trying to get you to think about it.

Can you let us know what toolboxes you have access to? This will greatly help us recommending you some functions.

Dave

CoolZone · Nov 3, 2007

for example,I would like to have some numbers pre-recorded in wav files and then,the program must recognise and display the number,when the speaker says it on the microphone.

as a toolbox,I could use the Voicebox Toolbox(http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html),or anything that is built-in Matlab 7.

Dave · Nov 3, 2007

CoolZone said:
for example,I would like to have some numbers pre-recorded in wav files and then,the program must recognise and display the number,when the speaker says it on the microphone.

as a toolbox,I could use the Voicebox Toolbox(http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html),or anything that is built-in Matlab 7.

Yes that is the toolbox I would recommend you look at using, there is nothing specifically built into Matlab 7 but you will need several of the core-functions.

As discussed in several previous experiments on this topic, the general process involves:

1. Recording the wav-file. In your case through the speakers and microphone. As the basis for your subsequent analysis, it is essential that all datasets are derived using the same equipments and process.
2. Read-in the wav-file into Malab using the core-function wavread .
3. Perform Fast Fourier Transformation on the wave file using the core-function fft .
4. Represent imaginary numbers in FFT matrix as real numbers by multiplying the matrix by its complex-conjugate.
5. Look at absolute value of the important part of the data in the new matrix.
6. Split the FFT matrix into bins and get the average of each bin.
7. Standardise the return value by dividing the matrix by it's sum. The higher the number, the higher the correlation between two samples and the more likely the voice match.

Dave

CoolZone · Nov 3, 2007

can you please give me a sample to look at ?

Dave · Nov 3, 2007

CoolZone said:
can you please give me a sample to look at ?

As an example:

Stages 1-6: http://www.andrew.cmu.edu/user/jterlesk/robotics/voice/soundSig.m

Summing stage: http://www.andrew.cmu.edu/user/jterlesk/robotics/voice/run.m

This code is very crude, however will allow you to distinguish between different voices - i.e. voice A is different to voice B. This is its limitations. I would also be interested in seeing how this code fairs up against a large data sample.

Dave

CoolZone · Nov 4, 2007

i already knew this code,I really hoped you had something better than this one.

Dave · Nov 4, 2007

CoolZone said:
i already knew this code,I really hoped you had something better than this one.

You asked for a sample and a sample is what you got. Perhaps you could contribute something to this discussion such as some of your own and original work. At the moment you seem to be under some illusion as to what you are going to get out of this.

Dave

GU_dx · Nov 10, 2007

hi, im also interested in this project since i also need to do a project regarding audio. i understood the basic algorithm to be used. but what else could be done to make the voice recognition program better and be more efficient. im not asking for code. just ideas so i ll try to code/implement them.
Thanks

afab1986 · Dec 14, 2007

hello every body,
first of all i am a new member here & im realy intreasted on what you all discusing
the process that Dave added is good but is there any body that could explain what is the operations Mathmaticly, i mean by equations , for the above Dave's process.
because i didnt take DSP course yet.
infact maybe i will include this idea in my graduation project if i get it wright
keep up the good work,

Dave · Dec 15, 2007

afab1986 said:
hello every body,
first of all i am a new member here & im realy intreasted on what you all discusing
the process that Dave added is good but is there any body that could explain what is the operations Mathmaticly, i mean by equations , for the above Dave's process.
because i didnt take DSP course yet.
infact maybe i will include this idea in my graduation project if i get it wright
keep up the good work,

Hi,

Many of the fucntions and mathematical operations can be implemented using integrated Matlab functions, thus allowing the user from abstracting from some of the complexities.

The only real mathematical operation is the FFT, have you come across this before? Have you done the standard Fourier Transform? I'll explain it if you are unsure about how it works.

Dave

afab1986 · Dec 17, 2007

hi DAVE,
Have you done the standard Fourier Transform?
Yes, i have in signals analysis & communication courses
but not the FFT yet.
I'm supposed to take the DSP course next semester "including FFT "
but what i want is to understand the matlab code posted previously and get
the idea mathematically; i.e what is the relation between the matrix we get out of FFT after "?standardization" using the "?bins" & correlation which the code used for comparing & judging the voice.
I hope you got the idea,
thanx for your help.

Dave · Dec 20, 2007

afab1986 said:
hi DAVE,
Have you done the standard Fourier Transform?
Yes, i have in signals analysis & communication courses
but not the FFT yet.
I'm supposed to take the DSP course next semester "including FFT "
but what i want is to understand the matlab code posted previously and get
the idea mathematically; i.e what is the relation between the matrix we get out of FFT after "?standardization" using the "?bins" & correlation which the code used for comparing & judging the voice.
I hope you got the idea,
thanx for your help.

The FFT is just an efficient implementation of the Discrete Fourier Transform. It achieves this by removing redundancy from the solution through an understanding of the importance of something called the root of unity (a Google search will explain what it is). Basically it works on the principle that some calculations of the DFT is known without the need to explicitly calculate the value. The important point is the DFT is the FFT.

When you perform the DFT you are mapping a time-domain signal (such as a voice signal) into the frequency domain. When we consider voice comparison we are looking at not pitch similarities (which to the human ear are similar for different people) but a match of frequency components in the sound output. So where person A may sound, to you and me, like person B, if we were to map the audio into the frequency domain we would expect that there would be distinct differences to mark one from another.

The conjugation and absolutes are nothing more than crude conditioning operations in order to allow for a comparison from one to another.

Why split into bins? An FFT bin emphasises a set from the FFT matrix contains the energy (or effective voltage) from a frequency range, it is not a single frequency. Single frequency components are not of much use because of other, often experimentally related, variables. We could say safety in numbers. It is important to stress that too small a bin size is useless for comparative purposes, whereas too large a bin dilutes the result. There are lots of sources on how to determine your bin size as related to your frequency range. You also need to average your (energy) value across the bin to ascertain a single value for that bin.

Finally you divide the FFT matrix by the average value for the bin within that frequency range - this tells you how close your FFT value (your frequency mapping for your chosen voice sample) for a particular frequency component compares to the average for that bin. Plotting energy against bin/frequency will show you a mapping from which you can make a comparison for the similarity of two voices.

As I stated previously, this is a crude method that will allow you to decipher between two different people. There are many further tweaks and analysis techniques you can implement to make the recognition package better, but hopefully this will give you a start.

Dave

BlackBox · Dec 21, 2007

I would suggest using cepstral analisys instead of FFT. It is much easier to recognise patterns deriving from the effect of the vocal tract physiology in the cepstral domain.

Cepstrum

If you are IEEE member you could look up the IEEE Xplore for a plethora of Cepstrum related voice recognition papers.

Good work

Dave · Dec 21, 2007

BlackBox said:
I would suggest using cepstral analisys instead of FFT. It is much easier to recognise patterns deriving from the effect of the vocal tract physiology in the cepstral domain.

Cepstrum

If you are IEEE member you could look up the IEEE Xplore for a plethora of Cepstrum related voice recognition papers.

Good work

Yes it is certainly worth considering. You will still need to take the Fourier Transform (FFT) if you are using cepstral analysis as it is explicit in the calculations. From my brief musings at IEEE Explore, there is plenty of evidence to suggest that it is a suitable tool for voice recognition.

Dave

afab1986 · Dec 22, 2007

hi Dave,
I will start researching and studying this project as soon as possible because I liked the idea and hopefully I will implement it in my future graduation project
Thanx again.
& KEEP UP THE GOOD WORK.

Dave · Dec 23, 2007

afab1986 said:
hi Dave,
I will start researching and studying this project as soon as possible because I liked the idea and hopefully I will implement it in my future graduation project
Thanx again.
& KEEP UP THE GOOD WORK.

Good luck with your research and project. Keep us posted on how it goes, I'd be interested in seeing your project come along. Also feel free to ask any further questions, if we can help we will.

Dave

d_devil · Dec 20, 2008

hey i 've just started reading about speech recognition systems, & before starting to write code for myself, i wanted to see the sample code someone mentioned above

well that page has been removed or something , so can anyone help me in this regard?

Dave · Dec 24, 2008

d_devil said:
hey i 've just started reading about speech recognition systems, & before starting to write code for myself, i wanted to see the sample code someone mentioned above

well that page has been removed or something , so can anyone help me in this regard?

The two codes previously referenced can be retrieved from web archive:

http://web.archive.org/web/20070902...u.edu/user/jterlesk/robotics/voice/soundSig.m

http://web.archive.org/web/20070902...ew.cmu.edu/user/jterlesk/robotics/voice/run.m

It is very crude and I would suggest you look at more advanced techniques (I have embellished here in this thread). But these codes are a good starting point.

Dave

pulkit.143 · Mar 25, 2009

hey dave...
i have imported the sound file directly into matlab and now i want to plot its fft...m tryin a code bt its nt working...m posting the code here ..can u tel me where m i going wrong..

[data,fs,nbits] = wavread("host.wav"); % Read wav file
data_fft = fft(data); % Perform FFT
P_data_fft = data_fft.* conj(data_fft) / size(data_fft,2); % Deduce Power Spectra
f = 1000*(0

size(data_fft,2)))/size(data_fft,2); % Define frequency range over which to plot power spectra. This is half the size of the fft since there is merely a reflection around the dc point
plot(f,P_data_fft(1

size(data_fft,2)+1))) % Plot

Thread starter	Similar threads	Forum	Replies	Date
	Arduino voice recognition using Hc-05.Garbage values received	Microcontrollers	0	Nov 20, 2017
M	security system using voice recognition	General Electronics Chat	4	Apr 19, 2014
S	Voice Recognition using matlab	Programming & Languages	1	Aug 4, 2012
T	voice recognition using matlab	Programming & Languages	1	Jul 24, 2010
N	Voice recognition using mathab or may b using neural network toolbox in mathlab	Machine Learning, AI & Neural Networks	4	Mar 27, 2010

Voice Recognition using Matlab

Join our Engineering Community! Sign-in with:

Voice Recognition using Matlab

CoolZone

Dave

CoolZone

Dave

CoolZone

Dave

CoolZone

Dave

GU_dx

afab1986

Dave

afab1986

Dave

BlackBox

Dave

afab1986

Dave

d_devil

Dave

pulkit.143

You May Also Like

STMicro, ADI, and Rohm Release New Op Amps for Industrial Applications

Heart Sensor Grows With Tissue to Measure Both Mechanical, Electrical Data

Arteris Revs New Version of Its Cache Coherent Interconnect IP

ST Drops Two Time-of-Flight Sensors for 3D Depth Sensing