matlab code for emotion recognition in speech

Discussion in 'Programmer's Corner' started by ankitsshah1987, Aug 6, 2009.

  1. ankitsshah1987

    Thread Starter New Member

    Aug 6, 2009
    hey plz help as soon as possible.. this is my final yr project
  2. steveb

    Senior Member

    Jul 3, 2008
    That doesn't sound easy to me.

    You'd better get going on an exhaustive literature search to figure out the current state of the art on this subject.

    8th European Conference on Speech Communication and Technology

    Geneva, Switzerland
    September 1-4, 2003


    Emotion Recognition by Speech Signals

    Oh-Wook Kwon, Kwokleung Chan, Jiucang Hao, Te-Won Lee

    University of California at San Diego, USA

    For emotion recognition, we selected pitch, log energy, formant, mel-band energies, and mel frequency cepstral coefficients (MFCCs) as the base features, and added velocity/ acceleration of pitch and MFCCs to form feature streams. We extracted statistics used for discriminative classifiers, assuming that each stream is a one-dimensional signal. Extracted features were analyzed by using quadratic discriminant analysis (QDA) and support vector machine (SVM). Experimental results showed that pitch and energy were the most important factors. Using two different kinds of databases, we compared emotion recognition performance of various classifiers: SVM, linear discriminant analysis (LDA), QDA and hidden Markov model (HMM). With the text-independent SUSAS database, we achieved the best accuracy of 96.3% for stressed/neutral style classification and 70.1% for 4-class speaking style classification using Gaussian SVM, which is superior to the previous results. With the speaker-independent AIBO database, we achieved 42.3% accuracy for 5-class emotion recognition.

    Sixth International Conference on Spoken Language Processing
    (ICSLP 2000)

    Beijing, China
    October 16-20, 2000

    Emotion Recognition in Speech Signal: Experimental Study, Development, and Application

    Valery A. Petrushin

    Center for Strategic Technology Research (CSTaR), Andersen Consulting, Northbrook, IL, USA

    The paper describes an experimental study on vocal emotion expression and recognition and the development of a computer agent for emotion recognition. The study deals with a corpus of 700 short utterances expressing five emotions: happiness, anger, sadness, fear, and normal (unemotional) state, which were portrayed by thirty subjects. The utterances were evaluated by twenty three subjects, twenty of whom participated in recording. The accuracy of recognition emotions in speech is the following: happiness - 61.4%, anger - 72.2%, sadness - 68.3%, fear - 49.5%, and normal - 66.3%. The human ability to portray emotions is approximately at the same level (happiness - 59.8%, anger - 71.7%, sadness - 68.1%, fear - 49.7%, and normal - 65.1%), but the standard deviation is much larger. The human ability to recognize their own emotions has been also evaluated. It turned out that people are good in recognition anger (98.1%), sadness (80%) and fear (78.8%), but are less confident for normal state (71.9%) and happiness (71.2%). A part of the corpus was used for extracting features and training computer based recognizers. Some statistics of the pitch, the first and second formants, energy and the speaking rate were selected and several types of recognizers were created and compared. The best results were obtained using the ensembles of neural network recognizers, which demonstrated the following accuracy: normal state - 55-75%, happiness - 60-70%, anger - 70-80%, sadness - 75-85%, and fear - 35-55%. The total average accuracy is about 70%. An emotion recognition agent was created that is able to analyze telephone quality speech signal and distinguish between two emotional states --"agitation" and "calm" -- with the accuracy of 77%. The agent was used as a part of a decision support system for prioritizing voice messages and assigning a proper human agent to response the message at call center environment. The architecture of the system is presented and discussed.
    Last edited: Aug 9, 2009
  3. KL7AJ

    AAC Fanatic!

    Nov 4, 2008
    This is such a NEW technology, I don't think many people are qualified to give real practical advice. On the other hand, isn't it the job of a thesis to blaze new trails? :)

    I would approach the problem like this. Do a lot of sampling of voices with different emotional content and look for mathematically distinguishing PATTERNS. This is what Matlab is good at. It's NOT going to be anywhere near as easy as looking for, say, phonemes.

    Probably the most prevalent characteristic would be an overall change in PITCH, which can be done with spectral analysis.

    Probably not much help, but maybe it can get you started.

  4. lopamudra misra

    New Member

    Aug 28, 2010
    ok I can give u a rough idea over this. The first step is to categorise which emotions u want to angry, sad, happy, neutral etc. Then accordingly collect speech samples (corpora) depicting these emotions. Then from these speech samples collect features like the mean, standard deviation and range of fundamental frequency (pitch), 1st n 2nd formants and energy (amplitude).
    Then use a classifier like neural network, K-means clustering etc to classify the emotions on the basis of tha values of various speech features.