Voice Recognition using Matlab

Discussion in 'Programmer's Corner' started by CoolZone, Nov 3, 2007.

  1. CoolZone

    Thread Starter New Member

    Nov 3, 2007
    4
    0
    hello people,


    I also have a project at the university regarding voice recognition in MATLAB.
    Can anyone help me with a project or with a program that does voice recognition in MATLAB?(preferably an explained one)
    I'm using prerecorded wav files,and the speaker is in front of the microphone,and the program must detect if the speaker has the same voice like the one in the samples,by saying the same words like in the samples.
     
  2. Dave

    Retired Moderator

    Nov 17, 2003
    6,960
    144
    We can help. Could you firstly give us an impression of what you have done, if anything. Perhaps you could also give us a brief overview of how you intend to go about it starting from reading the wav-file. It is generally a 6-7 part process - I'm not testing you, there is no right or wrong answer to this, I'm trying to get you to think about it.

    Can you let us know what toolboxes you have access to? This will greatly help us recommending you some functions.

    Dave
     
  3. CoolZone

    Thread Starter New Member

    Nov 3, 2007
    4
    0
    for example,I would like to have some numbers pre-recorded in wav files and then,the program must recognise and display the number,when the speaker says it on the microphone.

    as a toolbox,I could use the Voicebox Toolbox(http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html),or anything that is built-in Matlab 7.
     
  4. Dave

    Retired Moderator

    Nov 17, 2003
    6,960
    144
    Yes that is the toolbox I would recommend you look at using, there is nothing specifically built into Matlab 7 but you will need several of the core-functions.

    As discussed in several previous experiments on this topic, the general process involves:

    1. Recording the wav-file. In your case through the speakers and microphone. As the basis for your subsequent analysis, it is essential that all datasets are derived using the same equipments and process.
    2. Read-in the wav-file into Malab using the core-function wavread .
    3. Perform Fast Fourier Transformation on the wave file using the core-function fft .
    4. Represent imaginary numbers in FFT matrix as real numbers by multiplying the matrix by its complex-conjugate.
    5. Look at absolute value of the important part of the data in the new matrix.
    6. Split the FFT matrix into bins and get the average of each bin.
    7. Standardise the return value by dividing the matrix by it's sum. The higher the number, the higher the correlation between two samples and the more likely the voice match.

    Dave
     
  5. CoolZone

    Thread Starter New Member

    Nov 3, 2007
    4
    0
    can you please give me a sample to look at ?
     
  6. Dave

    Retired Moderator

    Nov 17, 2003
    6,960
    144
    As an example:

    Stages 1-6: http://www.andrew.cmu.edu/user/jterlesk/robotics/voice/soundSig.m

    Summing stage: http://www.andrew.cmu.edu/user/jterlesk/robotics/voice/run.m

    This code is very crude, however will allow you to distinguish between different voices - i.e. voice A is different to voice B. This is its limitations. I would also be interested in seeing how this code fairs up against a large data sample.

    Dave
     
  7. CoolZone

    Thread Starter New Member

    Nov 3, 2007
    4
    0
    i already knew this code,I really hoped you had something better than this one.
     
  8. Dave

    Retired Moderator

    Nov 17, 2003
    6,960
    144
    You asked for a sample and a sample is what you got. Perhaps you could contribute something to this discussion such as some of your own and original work. At the moment you seem to be under some illusion as to what you are going to get out of this.

    Dave
     
  9. GU_dx

    New Member

    Nov 10, 2007
    1
    0
    hi, im also interested in this project since i also need to do a project regarding audio. i understood the basic algorithm to be used. but what else could be done to make the voice recognition program better and be more efficient. im not asking for code. just ideas so i ll try to code/implement them.
    Thanks
     
  10. afab1986

    New Member

    Dec 14, 2007
    3
    0
    hello every body,
    first of all i am a new member here & im realy intreasted on what you all discusing
    the process that Dave added is good but is there any body that could explain what is the operations Mathmaticly, i mean by equations , for the above Dave's process.
    because i didnt take DSP course yet.
    infact maybe i will include this idea in my graduation project if i get it wright
    keep up the good work,
     
  11. Dave

    Retired Moderator

    Nov 17, 2003
    6,960
    144
    Hi,

    Many of the fucntions and mathematical operations can be implemented using integrated Matlab functions, thus allowing the user from abstracting from some of the complexities.

    The only real mathematical operation is the FFT, have you come across this before? Have you done the standard Fourier Transform? I'll explain it if you are unsure about how it works.

    Dave
     
  12. afab1986

    New Member

    Dec 14, 2007
    3
    0
    hi DAVE,
    Have you done the standard Fourier Transform?
    Yes, i have in signals analysis & communication courses
    but not the FFT yet.
    I'm supposed to take the DSP course next semester "including FFT "
    but what i want is to understand the matlab code posted previously and get
    the idea mathematically; i.e what is the relation between the matrix we get out of FFT after "?standardization" using the "?bins" & correlation which the code used for comparing & judging the voice.
    I hope you got the idea,
    thanx for your help.
     
  13. Dave

    Retired Moderator

    Nov 17, 2003
    6,960
    144
    The FFT is just an efficient implementation of the Discrete Fourier Transform. It achieves this by removing redundancy from the solution through an understanding of the importance of something called the root of unity (a Google search will explain what it is). Basically it works on the principle that some calculations of the DFT is known without the need to explicitly calculate the value. The important point is the DFT is the FFT.

    When you perform the DFT you are mapping a time-domain signal (such as a voice signal) into the frequency domain. When we consider voice comparison we are looking at not pitch similarities (which to the human ear are similar for different people) but a match of frequency components in the sound output. So where person A may sound, to you and me, like person B, if we were to map the audio into the frequency domain we would expect that there would be distinct differences to mark one from another.

    The conjugation and absolutes are nothing more than crude conditioning operations in order to allow for a comparison from one to another.

    Why split into bins? An FFT bin emphasises a set from the FFT matrix contains the energy (or effective voltage) from a frequency range, it is not a single frequency. Single frequency components are not of much use because of other, often experimentally related, variables. We could say safety in numbers. It is important to stress that too small a bin size is useless for comparative purposes, whereas too large a bin dilutes the result. There are lots of sources on how to determine your bin size as related to your frequency range. You also need to average your (energy) value across the bin to ascertain a single value for that bin.

    Finally you divide the FFT matrix by the average value for the bin within that frequency range - this tells you how close your FFT value (your frequency mapping for your chosen voice sample) for a particular frequency component compares to the average for that bin. Plotting energy against bin/frequency will show you a mapping from which you can make a comparison for the similarity of two voices.

    As I stated previously, this is a crude method that will allow you to decipher between two different people. There are many further tweaks and analysis techniques you can implement to make the recognition package better, but hopefully this will give you a start.

    Dave
     
  14. BlackBox

    Member

    Apr 22, 2007
    20
    0
    I would suggest using cepstral analisys instead of FFT. It is much easier to recognise patterns deriving from the effect of the vocal tract physiology in the cepstral domain.

    Cepstrum

    If you are IEEE member you could look up the IEEE Xplore for a plethora of Cepstrum related voice recognition papers.

    Good work
     
  15. Dave

    Retired Moderator

    Nov 17, 2003
    6,960
    144
    Yes it is certainly worth considering. You will still need to take the Fourier Transform (FFT) if you are using cepstral analysis as it is explicit in the calculations. From my brief musings at IEEE Explore, there is plenty of evidence to suggest that it is a suitable tool for voice recognition.

    Dave
     
  16. afab1986

    New Member

    Dec 14, 2007
    3
    0
    hi Dave,
    I will start researching and studying this project as soon as possible because I liked the idea and hopefully I will implement it in my future graduation project
    Thanx again.
    & KEEP UP THE GOOD WORK.
     
  17. Dave

    Retired Moderator

    Nov 17, 2003
    6,960
    144
    Good luck with your research and project. Keep us posted on how it goes, I'd be interested in seeing your project come along. Also feel free to ask any further questions, if we can help we will.

    Dave
     
  18. d_devil

    New Member

    Dec 20, 2008
    1
    0
    hey i 've just started reading about speech recognition systems, & before starting to write code for myself, i wanted to see the sample code someone mentioned above

    well that page has been removed or something , so can anyone help me in this regard?
     
  19. Dave

    Retired Moderator

    Nov 17, 2003
    6,960
    144
    The two codes previously referenced can be retrieved from web archive:

    http://web.archive.org/web/20070902...u.edu/user/jterlesk/robotics/voice/soundSig.m

    http://web.archive.org/web/20070902...ew.cmu.edu/user/jterlesk/robotics/voice/run.m

    It is very crude and I would suggest you look at more advanced techniques (I have embellished here in this thread). But these codes are a good starting point.

    Dave
     
  20. pulkit.143

    New Member

    Mar 23, 2009
    3
    0
    hey dave...
    i have imported the sound file directly into matlab and now i want to plot its fft...m tryin a code bt its nt working...m posting the code here ..can u tel me where m i going wrong..


    [data,fs,nbits] = wavread("host.wav"); % Read wav file
    data_fft = fft(data); % Perform FFT
    P_data_fft = data_fft.* conj(data_fft) / size(data_fft,2); % Deduce Power Spectra
    f = 1000*(0:(size(data_fft,2)))/size(data_fft,2); % Define frequency range over which to plot power spectra. This is half the size of the fft since there is merely a reflection around the dc point
    plot(f,P_data_fft(1:(size(data_fft,2)+1))) % Plot
     
Loading...