Audio Signal Separation - Identifying Interest points

Thread Starter


Joined Sep 17, 2013
Sorry if this is the wrong forum, I'm having some difficulty understanding the following problem:

Suppose I have an Audio file (.wav) that says: "My name is Phorce" which looks a little something like the attachment. What I want to do is split this signal so I can identify each individual word.

I can obtain the duration of the audio file by computing "duration = frames/rate" but I do not know what a good and accurate way to split the signal since I do not want to overlap a particular word, and, thus missing the word. The word could be of any length and I have to determine whether or not it is actually a word, or, just white noise.

My problem at the moment is splitting the signal into equal blocks so that the different words can be captured.

Could anyone suggest anything that may help?

Please feel free to ask if I haven't been clear.


Thread Starter


Joined Sep 17, 2013
Yes, but without prior knowledge of words.. Basically, I need to split the signal up into different components in order to be able to identify whether or not it is a word or not. But, I don't know what to split the signal into automatically so I don't miss words out.. If this makes sense?

So let's assume I have the following sentence as a 1D signal: "My name is phorce" It would then become:

vect[0] = "My",
vect[1] = "Name",
vect[2] = "Is",
vect[3] = "Phorce"

Does this make sense?


Joined May 11, 2009
Yes it make a lot of sense. But that chart of yours it looks like matlab if I am not wrong. And I do not use that software. What I would have done is to look for periods of silence as a start. Perhaps used some kind of threshold detector.


Joined Mar 31, 2012
It looks like there is a fifth word there. Sure it it isn't, "Hi, my name is Phorce."?

You said you wanted to split the signal into equal blocks so that different words can be captured. Well, if the words aren't of the same duration, how is this possible?

It would appear that your problem is going to hinge on identifying low signal periods between words and between parts of words. I'm somewhat surprised that the signal drops off as much as it does between consonants as it appears to. Perhaps if you examine the signal more carefully you will discover that there is more signal in the intraword gaps than it looks like.

Austin Clark

Joined Dec 28, 2011
You could search for peaks in the WAV file (which is easy to do because wav files are in PCM) and whenever you find a peak that exceeds a threshold you set, you mark that as a start point (maybe a little prior, to be safe) and mark a stop point when there are no more peaks above your threshold for a certain amount of time. Once you have all of the segments you want to keep, you move them each to a new WAV file.
This would require you to break up your words in the audio clip, and programming and WAVE file format knowledge, but honestly shouldn't be too hard, and ought to do a pretty good job.

Out of curiosity, what is this to be used for?