Voice recoginition

Thread Starter

kunal juneja

Joined Mar 6, 2012
9
HIIII guys i am making a voice recoginition software using matlab .....i have given a project for training but i am new in this as i know very little about it ...please guys help me out in this please
 

panic mode

Joined Oct 10, 2011
2,753
LOL, so true... this is far from trivial, teams of experts were spending tons of time to get this to work.

here are some concerns:

first of all, MATLAB code is interpreted and it is very slow. if you are interested in speech recognition, you will need something with lot's of processing power. interpreter will kill any high$ hardware.

then you need to define constraints (what is to be accomplished).
do you need to recognise 2-3 commands or entire random sentences?

are you expecting to recognise voice of one and same speaker or there is variety of voices that this thing should handle?

is there supposed to be clear indication of begin and end of each word? how will computer know when each word starts and ends?

in a simplest form, basic FFT could be enough to distinguish few commands.
FFT gives you frequency spectrum of some time domain signal.

good luck...
 

Thread Starter

kunal juneja

Joined Mar 6, 2012
9
LOL, so true... this is far from trivial, teams of experts were spending tons of time to get this to work.

here are some concerns:

first of all, MATLAB code is interpreted and it is very slow. if you are interested in speech recognition, you will need something with lot's of processing power. interpreter will kill any high$ hardware.

then you need to define constraints (what is to be accomplished).
do you need to recognise 2-3 commands or entire random sentences?

are you expecting to recognise voice of one and same speaker or there is variety of voices that this thing should handle?

is there supposed to be clear indication of begin and end of each word? how will computer know when each word starts and ends?

in a simplest form, basic FFT could be enough to distinguish few commands.
FFT gives you frequency spectrum of some time domain signal.

good luck...
hey panic mode thhxx n yes i am not making any complex one ....yeah this recognition will be of only my voice and when i say 1 it will recognize with the saved template and will write in computer....

and there would be clear indication of begin and end ...i am making of only numbers 1-9....

yeah i came to know that performing FFT will sort out the problem when i am writing command in matlab to read wav file its not accepting the command even i have written the path also....i dont knoe whats the problem???
 

Thread Starter

kunal juneja

Joined Mar 6, 2012
9
HEY PANIC THANXX alot FOR GIVING ATTENTION but when i am writing this code :

% read in sound file and determine length and period
[signal, Fs, bits_per_sample]=wavread('speech.wav');
error is coming at 67 line in a new tab as invalid file i really appreciate that u r helping me and giving your time on this project.
 

panic mode

Joined Oct 10, 2011
2,753
i have no idea what 67-th line is, this code is short, you do have to make sure that you replace speech.wav with your file name. if that file is not in the same folder as the matlab code, you have to specify full path, something like

C:\Users\myself\Desktop\fancy_command.wav
 

Thread Starter

kunal juneja

Joined Mar 6, 2012
9
hey thxx again panic i have read something in which there was written that you have to make training stage and testing stage so this is the code i am writing for training stage s = 8000; % Sampling Freq (Hz)
Duration = 10; % Duration (sec)
y = wavrecord(10,8000);
framesize = 80; % Framesize (samples)
Fs = 8000; % Sampling Frequency (Hz)
RUNNING = 1; % A flag to continue data capture

% Setup data acquisition from sound card
ai = analoginput('winsound');
addchannel(ai, 1);

% Configure the analog input object.
set(ai, 'SampleRate', Fs);
set(ai, 'SamplesPerTrigger', framesize);
set(ai, 'TriggerRepeat',inf);
set(ai, 'TriggerType', 'immediate');

% Start acquisition
start(ai)

% Keep acquiring data while "RUNNING" ~= 0
while RUNNING
% Acquire new input samples
newdata = getdata(ai,ai.SamplesPerTrigger);
% Do some processing on newdata



% Set RUNNING to zero if we are done


end

% Stop acquisition
hey panic plzz suggest me i am going on right path....
 

panic mode

Joined Oct 10, 2011
2,753
never tried to use matlab tp actually read the data straight from soundcard but:

your code seam to only focus on one thing - collect data sample.
that is ok (one of needed steps) but this is small part of the problem.
bigger (actual) problem is to process/evaluate data.
 

Thread Starter

kunal juneja

Joined Mar 6, 2012
9
hey thxx mr. panic once again yes this is for training stage i want to know what shall i do next means how to make testing stage ....if i speak 1 then how will it do the matching what shall i do next??
 

panic mode

Joined Oct 10, 2011
2,753
well this is where you need to jump in, make some choices, then implement them and see if it is good enough. if not try something else and keep improving until you get good results.

for example you can use FFT, sort the peaks by magnitude and/or focus on ones above some threshold. you get magnitude and frequency for each peak. you can have lookup table with your commands with expected freq. and magnitudes (assuming you had either working teach function or someone to manually compile sample data into a table).

then whenever command is received, find frequencies and magnitudes of current command, then look in the table for closest match.

supposedly your LUT is

Rich (BB code):
f1     f2     f3     m1     m2     m3     command
800   900   1100   0.24     0.33    0.15     START
450   880   2100   0.51     0.35    0.05     STOP
710  1200  1800   0.42     0.22    0.42     TOGGLE
etc.
then if you get signal that (once processed using FFT) gives values:
463 873 1948 0.54 0.36 0.016

you are supposed to determine which command was the most likely one.
in this case, good candidate is STOP command.

actual matching process can be anything (covariance, vectors, sums of errors etc.), you need to pick one and try it out.
 

Thread Starter

kunal juneja

Joined Mar 6, 2012
9
hey thanks panic but can i use this coding

This program records the voice
function [norm_voice,h] = Voice_Rec(sample_freq)
option = 'n';
option_rec = 'n';
record_len = 1; %Record time length in seconds
%sample_freq = 8192; %Sampling frequency in Hertz
sample_time = sample_freq * record_len;

'Get ready to record your voice'
name = input('Enter the file name you want to save the file with: ','s');
file_name = sprintf('%s.wav',name);
option_rec = input('Press y to record: ','s');
if option_rec=='y'
while option=='n',
input('Press enter when ready to record--> ');
record = wavrecord(sample_time, sample_freq); %Records the input through the sound card to the variable with specified sampling frequency
input('Press enter to listen the recorded voice--> ');
sound(record, sample_freq);
option = input('Press y to save or n to record again: ','s');
end
wavwrite(record, sample_freq, file_name); %Save the recorded data to a file with the specified file name in .wav format
end
[voice_read,FS,NBITS]=wavread(file_name);
norm_voice = normalize(voice_read);
norm_voice = downsmpl(norm_voice, sample_freq);
le=32;
h=daubcqf(le,'min');


function vec = normalize(vec)

temp_vec = vec-mean(vec);
sum_temp_vec = sum(temp_vec.*temp_vec);
sqrt_temp_vec = sqrt(sum_temp_vec);
vec = (1/sqrt_temp_vec)*temp_vec;

function sampled = downsmpl(voice, freq)

x=freq;
y = freq/2;
z=1;
a=1;
sampled=0;
while z<freq,
sampled(a) = sqrt(abs(voice(z)*voice(z+1)));
a=a+1;
z = z+2;
end
sampled = sampled';


function [h_0,h_1] = daubcqf(N,TYPE)
% [h_0,h_1] = daubcqf(N,TYPE);
%
% Function computes the Daubechies' scaling and wavelet filters
% (normalized to sqrt(2)).
%
% Input:
% N : Length of filter (must be even)
% TYPE : Optional parameter that distinguishes the minimum phase,
% maximum phase and mid-phase solutions ('min', 'max', or
% 'mid'). If no argument is specified, the minimum phase
% solution is used.
%
% Output:
% h_0 : Minimal phase Daubechies' scaling filter
% h_1 : Minimal phase Daubechies' wavelet filter
%
% Example:
% N = 4;
% TYPE = 'min';
% [h_0,h_1] = daubcqf(N,TYPE)
% h_0 = 0.4830 0.8365 0.2241 -0.1294
% h_1 = 0.1294 0.2241 -0.8365 0.4830
%
if(nargin < 2),
TYPE = 'min';
end;
if(rem(N,2) ~= 0),
error('No Daubechies filter exists for ODD length');
end;
K = N/2;
a = 1;
p = 1;
q = 1;
h_0 = [1 1];
for j = 1:K-1,
a = -a * 0.25 * (j + K - 1)/j;
h_0 = [0 h_0] + [h_0 0];
p = [0 -p] + [p 0];
p = [0 -p] + [p 0];
q = [0 q 0] + a*p;
end;
q = sort(roots(q));
qt = q(1:K-1);
if TYPE=='mid',
if rem(K,2)==1,
qt = q([1:4:N-2 2:4:N-2]);
else
qt = q([1 4:4:K-1 5:4:K-1 N-3:-4:K N-4:-4:K]);
end;
end;
h_0 = conv(h_0,real(poly(qt)));
h_0 = sqrt(2)*h_0/sum(h_0); %Normalize to sqrt(2);
if(TYPE=='max'),
h_0 = fliplr(h_0);
end;
if(abs(sum(h_0 .^ 2))-1 > 1e-4)
error('Numerically unstable for this value of "N".');
end;
h_1 = rot90(h_0,2);
h_1(1:2:N)=-h_1(1:2:N);
plzz help me and tell me wherever i am wrong
REGARDS
KUNAL
 

Thread Starter

kunal juneja

Joined Mar 6, 2012
9
hey panic i am acquiring the voice and plotting the data but want to do sampling
and save the data but don't know how to do sampling and save the data
this is the code
ai = analoginput('winsound');
addchannel(ai,1:2);
set(ai,'SampleRate',44100)
set(ai,'SamplesPerTrigger',44100)
start(ai)
data = getdata(ai);
plot(data)
i know this will work plzz help me want to do sampling of qcqired data plzz help me
panic
 

Thread Starter

kunal juneja

Joined Mar 6, 2012
9
hey panic is there any problem in above coding i have written .as when i am writting this code a graph has been plotted .please reply it fast.
and how can i use fft for matching
reply as soon as possible.
regards
kunal
 

panic mode

Joined Oct 10, 2011
2,753
don't ask me that. i am sorry but i have neither time nor interest to even look at it.

disclaimar:
my participation here is purely voluntary and for my own entertainment. i do spend some time helping people get their project or idea off the ground - when i feel like it. but what you ask is work and I don't like that. i do work but only on things i like (or have to) or when nice chunk of money is involved.

I don't want to leave you hanging so let me TRY to help you:
you probably think that with that tiny code sample you may have something tangible but i can say with confidence that you are very wrong. i see no structure, no planing, no analysis, no breakdown or description of tasks, no experiments, no tests etc. this is NOT how engineering process looks like and i am not going to get involved. when doing design, you need to be analytical and efficient. you need to conceptualize, plan, develop, categorize, modularize, implement, develop experiments and test procedures, execute them, analyse results, make evaluations of each completed stage and decide to proceed with next one or make adjustments and repeat current one until it works properly. what you have here is far cry from it.

this is what you SHOULD have done:
you are interested in matching input (received or recorded signal). matching with what? what research have you done? have you determined what you are going to compare or match? it is easy to compare numbers (IF I=5 THEN ...) but how do you compare voice command? how would you make distinction between commands? and what are commands? obviously some sort of voice input but what exactly? long sentence? i don't think so. singing voice? probably not. perhaps single word? i still don't think so (but you are free to go for it). how about single letter or sound? that is probably more like it. you may need to make library of all possible sounds and then compare input (also single sound) with each one in the library and see if you can recognise it. so what would one sound consist of? easy peasy - run experiemnt and get some values, preferably with different subjects, and find out what part of signal is most useful. suppose you did find distinctive features of voice sounds, how would you represent them in matlab? suppose you decided on particular representation form (vector or matrix of some sort), how would you populate values of that representation? do you want to type values by hand? how long will take to complete list of all sounds? how long would it take to do this? suppose you decided that your library is consisting of all letters in an alphabet and for simplicity suppose each letter only makes one simple sound. what about when you close matlab? do you want to start making library every time you open the program? or do you think making library file so you can just save it once it is complete and load it every time you need to run program? would making "teach" program make sense to speed up populating library make sense? perhaps have program tell you what sound to make and wait for you do make it? what if you made mistake? supposedly you got all this working and you save your own library, how about multiple users? could you create multiple libraries, one for each user? how about names (Alex and David have different voice, each of them should have own sound library)? so suppose you got the library figured out, what and how to get from input, how to represent it, how to store it, how to enter it, how to save it, how to read it. NOW you can start thinking about matching. and this is to match one sound from one and same speaker (person).
suppose you are comparing bunch of numbers you generated from received signal (input) versus data stored in library. you need to iterate through library in some way to make number of comparisons. one way is using loop (FOR NEXT, or whatever). i see no sign of it. supposedly we only care about comparing two set of numbers (arrays or vectors), how would you do that? one way to compare vectors (that potentially could be used here) is to find magnitude of their difference. but is the data in your input and library normalized? suppose all of this is done and worked fine, how fast is it? does it take 30seconds to crunch numbers or it is fast enough to recognize sounds in real time as person speaks? what is success rate? you think you are done? that was just single sound, how about commands? do you have command library? can you still make reasonable prediction and match spoken word to a word in library even if one of sounds was skipped or if it was detected twice (maybe person decided to speak slower)?

summary:
this post is what concept stage is all about - merely recognizing and sizing the problem (and this is just a small intro, it is nowhere near complete). how many of those question you have asked youself and for how many of them you had the answer before reading this post? do you see now how hopelessly far from working this code sample is? do you see difference in approach? one does not just sit down, pastes few code snipets snatched from internet (some 20-30 lines) and tells "my voice recognition project is done, please test it for me". i hope you can at least understand why i and probably everyone else will not have time or interest in this. i've just wasted a perfectly good hour, telling you something should have been learned long ago. i hope you get something out of it...

what made you embark on project like this? how much of this project you think you can complete on your own? i see zero initiative...
 
Top