audio frequency comb filter

Thread Starter

laceholes

Joined Jul 26, 2016
30
I'm looking for help in designing an audio comb filter for separating the frequencies present in a human voice singing in its lowest register, say C2 to C4. Can anyone help me please?
 

benta

Joined Dec 7, 2015
101
You need to be more specific.
What do you mean by "separating the frequencies"?
Do you want to detect the presence of the basic frequency for each sung tone?
Do you want to detect the harmonics of a tone?
I'm aware of what C2...C4 means, but otherwise confused. And I'm pretty certain that you are NOT looking for a comb filter, that's something rather different.

Benta.
 

Thread Starter

laceholes

Joined Jul 26, 2016
30
Hi Benta. I want to detect the fundamental frequency of each tone sung but in the lowest register human voices contain no fundamental frequency. The way humans recognise them with our ears is to detect their harmonics and our brain does the rest. So I want to emulate the detection of the harmonics electronically. The range of fundamental frequencies involved exceeds that of the lowest register so it includes some of the notes in the next two registers. There the problem is not so difficult because there the signal will contain a fundamental. Only one note will be sung at a time.
Papabravo I called what I want a comb filter because its frequency response will look like the teeth of a comb. Sorry if I chose the wrong word.
 

Papabravo

Joined Feb 24, 2006
21,158
Hi Benta. I want to detect the fundamental frequency of each tone sung but in the lowest register human voices contain no fundamental frequency. The way humans recognise them with our ears is to detect their harmonics and our brain does the rest. So I want to emulate the detection of the harmonics electronically. The range of fundamental frequencies involved exceeds that of the lowest register so it includes some of the notes in the next two registers. There the problem is not so difficult because there the signal will contain a fundamental. Only one note will be sung at a time.
Papabravo I called what I want a comb filter because its frequency response will look like the teeth of a comb. Sorry if I chose the wrong word.
I knew what you were talking about and the terminology is correct. I wan't sure if you were looking for an analog representation or had planned a DSP approach all along. There is also an inverse comb filter which has a succession of notches that filter out particular frequencies.
 

AnalogKid

Joined Aug 1, 2013
10,986
At the concept level, you can make a comb filter with an analog (!!!) delay line and a summing amplifier. The frequencies and widths of the comb tines are a function of the delay period and the fundamental frequency. For something where the fundamental is known and stable, like the 3.58 MHz color subcarrier frequency in NTSC video, a comb filter can do an excellent j0b of picking off (or notching out) the subcarrier and its harmonics. For speaking or singing voice, this is much more difficult. Since you said this would be for one note at a time, are you looking for something that automatically locks onto the note, or can your application tolerate manual adjustments?

This can be done without a DSP, but a frequency-tracking delay line is not simple stuff. Also, a voltage-variable delay line that covers the periods of a full octave for first formants is doable in theory, but probably not in practice without going to a digital delay or quasi-digital like a bucket brigade. A possibility is an acoustic delay like the springs in a reverb amp, but those are almost always fixed delays. A DSP has several advantages, not the least of which are software based delays of any length and very steep bandpass filters without a zillion precision matched components. But the software is *very* heavy lifting.

ak
 

Papabravo

Joined Feb 24, 2006
21,158
At the concept level, you can make a comb filter with an analog (!!!) delay line and a summing amplifier. The frequencies and widths of the comb tines are a function of the delay period and the fundamental frequency. For something where the fundamental is known and stable, like the 3.58 MHz color subcarrier frequency in NTSC video, a comb filter can do an excellent j0b of picking off (or notching out) the subcarrier and its harmonics. For speaking or singing voice, this is much more difficult. Since you said this would be for one note at a time, are you looking for something that automatically locks onto the note, or can your application tolerate manual adjustments?

This can be done without a DSP, but a frequency-tracking delay line is not simple stuff. Also, a voltage-variable delay line that covers the periods of a full octave for first formants is doable in theory, but probably not in practice without going to a digital delay or quasi-digital like a bucket brigade. A possibility is an acoustic delay like the springs in a reverb amp, but those are almost always fixed delays. A DSP has several advantages, not the least of which are software based delays of any length and very steep bandpass filters without a zillion precision matched components. But the software is *very* heavy lifting.

ak
AK is correct in his analysis of doing an analog comb filter. You should go the digital route so you can easily experiment with alternative approaches.
 

Thread Starter

laceholes

Joined Jul 26, 2016
30
OK analogkid and papabravo. I understand. To go the digital route would I digitise the entire signal coming from the microphone pre amp a play about with software to see what I could achieve?
 

Thread Starter

laceholes

Joined Jul 26, 2016
30
Sorry. I was just thinking aloud. What I meant is that I forgot to say that although there is only one sound at a time to be analysed for finding its fundamental the software doesn't know what note it's looking for. Presumably this would mean it has to search for any recognisable note in the sound? This analysis would have to take no more than the sort of time that the other musicians playing alongside the singer consider acceptable. The lowest frequency represented solely by its harmonics in the sound is likely to be about 50 Hz, ie a periodic time of 20 mS so I would guess that a time delay of a few cycles of 50 Hz ( twice as many at 2f etc) would be tolerable. But as the software doesn't know which note to look for could it run fast enough to process all the possibilities in say 2 octaves worth?
 

Papabravo

Joined Feb 24, 2006
21,158
The sampling rate has to be at least two times the highest frequency component in the sound. Normally audio harmonics of say greater than the 10th harmonic are sufficiently low in amplitude that they can be absorbed by an analog anti-aliasing filter. Once the samples are acquired they are subjected to a Fast Fourier Transform which shows the fundamental and the harmonics. A 32-bit DSP should be more than sufficient to handle the task.
 

Thread Starter

laceholes

Joined Jul 26, 2016
30
Thanks for that, papabravo. So if my highest note is say 100Hz and it is coded by say only six harmonics (as I'm told by the website on human voice spectrum while singing a vowel) the time required to sample 100 Hz is at least half the 10 mS, but to sample the sixth harmonic the processor would have to take 12 samples in the same time. Does that sound right? If it is then it sounds as if a processor is plenty fast enough for sampling for any one note. But since that one note could be any note in the octave below 100Hz what sampling rate would be needed? And how much time would it take to perform a fast Fourier transform and search for the correct note?
 
Last edited:

Papabravo

Joined Feb 24, 2006
21,158
Your analysis is correct. The sixth harmonic of 100 Hz. is 600 Hz. and twice that frequency is 1200 Hz., so 1200 samples per second would be the minimum sampling rate you could use. The period is just under 1 msec at 833.3 μsec. I cannot answer the second question because it depends on how many of those samples you use as an input, and what speed your processor is running. More information is required.

For starters you can compute the frequency resolution of the process of sampling at twice the highest frequency by dividing the number of samples by 2, thus

512 samples at 1200 Hz. is 426 milliseconds

512 samples / 2 = 256 FFT bins

600 Hz. / 256 FFT bins = 2.34 Hz. / FFT bin

You can change the number of samples or you can change the amount of oversampling to change these numbers. Once you figure out how many samples you want you can run the experiment to determine the amount of time it takes. These numbers may be available from benchmarks for the particular processor you choose. Unfortunately this is already about 10 times the requirement of 2 cycles at 50 Hz. or 40 msec.
 
Last edited:

MrChips

Joined Oct 2, 2009
30,706
One method of analyzing the frequency content of an audio waveform is to perform the Fast Fourier Transform (FFT) on a microcomputer and this is very doable. I'm actually doing this right at this moment on a STM32F demonstration project. I also did this in MATLAB for another demonstration.

However, the frequency resolution is the inverse of the sample record. Hence to obtain 1Hz resolution you would have to sample for 1 second.

You can play with the numbers,

Frequency Resolution = Sampling Frequency / Number of Samples

to obtain the desired resolution.

What frequency resolution do you require? One cent at 100Hz would be approximately 6/100Hz, about 0.06Hz. It has been suggested that humans can detect changes of about 5 to 12 cents. 10 cents would be approx. 0.6Hz at 100Hz.

If you know that the tone is in a given range, it might be better to create a band-pass filter around that frequency and do the analysis by some other means such as counting zero crossings.

I have no idea how musical instrument tuners work, perhaps by phase detection and comparison?

Edit: Corrected my estimate of 1 cent
 

Thread Starter

laceholes

Joined Jul 26, 2016
30
You've given me a lot to think about papabravo. Now I'll have to spend a bit of time digesting it before I post next time. M zaid and Mr Chips thank you both for your inputs. However, so far as using a filter as you say, I don't know what fundamental frequency will be present as the input will be sung music covering about an octave. Also there will be harmonics present so doesn't that affect the number of zero crossings? Also there will be no fundamental frequency present for the lower notes because that's how the human voice is. It is in effect coded by its harmonic content and it's that which the listener hears. The brain decodes See www.phys.ucom.edu/~gibson/Notes/Section7_4.htm
"More technically, the brain finds the greatest common divisor of all the harmonics that it hears. In other words, it looks for that frequency from which all the harmonics can be formed. For example, consider the series: 600, 800, 1000, 1200 Hz… If 600 Hz were considered to be the fundamental, 800 would not be a harmonic of this fundamental, and the overtone series would not make sense to the ear. The largest number that evenly divides each frequency of this series is 200 Hz. Then the series is 3f, 4f, 5f, 6f, … So, in this case, the "pitch" of the note is 200 Hz. Of course, 100 Hz also divides all of the frequencies in the series, as does 50 Hz and 25 Hz, etc. That is why we must consider the high frequency that divides all of the rest."
 
Last edited:

Thread Starter

laceholes

Joined Jul 26, 2016
30
The above reference is wrong. It should be newt.phys.unsw.edu.au/jw/voice.html where it says
  • "Mechanism 0 (M0) is also called ‘creak’ or ‘vocal fry’. Here the tension of the folds is so low that the vibration is not periodic (meaning that successive vibrations have substantially different lengths). M0 sounds low but has no clear pitch (Hollien and Michel, 1968). Experiment: if you hum softly the lowest note you can and then go lower, you will probably produce M0." Sorry for my error.
 

Papabravo

Joined Feb 24, 2006
21,158
The bind you are in is that you have a time constraint, 40 milliseconds, on both sampling and computation. As you can see fewer samples means a wider bandwidth for each FFT bin which means a much fuzzier picture of the spectrum from the sampled sound.
 

Thread Starter

laceholes

Joined Jul 26, 2016
30
Papabravo, I was thinking that a single processor would sample whatever note was being sung out of 12 possibles and perhaps that was the wrong idea. If each possible note had its own sampler, set to the expected frequency but with a tolerance, it would require 12 samplers per octave. Sticking with 50 Hz as the lowest note and with sampler No 12 sampling the 100 Hz would that ease the problem of speed? I don't think so but what do you say? Mr Chips, thanks for your numeric reply. That's very useful but I don't understand what you mean by cents. The frequency change in going from any note to the next one chromatically is 1/12 root of 2, about 6%. If the singer was off by say 2% it would be tolerable to a listener but what about the sampler. Would it still be able to give the harmonic content so that the logic circuits could decide what note was being sung? This refers to what I wrote above...how many samplers are we talking about for one octave of sung notes?There is still the problem that papabravo mentioned...the 40 mS. The first idea I had for deciding what the sung note is, was to sample then pass a go/no go message ref the presence of each harmonic to some logic to decide what the note is from its harmonic content. But if there were 12 samplers I suppose the logic wouldn't be required?
 
Last edited:

Externet

Joined Nov 29, 2005
2,200
Many years ago, I was in charge of a product at a broadcasting equipment factory. (manufacturing, test, calibration, repair, customer service...) Had a filter comb for specific 'stereolizer' usage, in many comb 'teeth' fashion. If I remember well, was done in analog.
Check IC4, IC5, IC6, IC7, IC8 at page 22 ----> ftp://ftp.orban.com/275A/275A_Manual_Appendices.pdf
For low voice range usage may need some tweaking.
I may still be able to remember/guide/answer some things about its complexity.
 
Top