Upsampling WAV before changing audio pitch

Thread Starter

hrs

Joined Jun 13, 2014
270
Hi,

Sometimes I use Audacity to change the pitch of a song a few semi-tones without changing the speed. Most of the time this works ok, other times not so much. It can introduce a glitchy 'under water sound' and errors in the stereo image.

I recently did it again and it occured to me that this could be due to aliasing of the band limited audio, 44100 Hz typically, because of Nyquist ... stuff. I have only a general understanding of this and no idea how the pitch algorithm works. Then I resampled to 96000 Hz before pitch shifting and the shifted output was much improved.

Is there some math to guestimate the optimal/minimum upsampled frequency, for example based on the ratio of the input and output base frequencies?
 

s14rs4

Joined Sep 15, 2016
52
I am not an expert in this, but are you having problems when you are shifting up in frequency? You may be shifting the upper frequencies beyond what the sampling rate can handle. Try using a low pass filter on the input before shifting, to keep the output within the range of the sampling rate.
 

Thread Starter

hrs

Joined Jun 13, 2014
270
Thanks for your input. In one particularly bad case it was 2 semitones down, say from E to D. This turned out to be fixeable by first upsampling to 96000 Hz. I'm mostly wondering about the mechanism at work here.
 

s14rs4

Joined Sep 15, 2016
52
As I said earlier I am no expert. The only suggestion I can make is that you are resampling a signal that is already digitised. It is like taking a photocopy of a photocopy, you are going to lose data.

I am probably completely wrong. :rolleyes:
 

bogosort

Joined Sep 24, 2011
566
Is there some math to guestimate the optimal/minimum upsampled frequency, for example based on the ratio of the input and output base frequencies?
Consider the simple case: pitch-shifting an octave up, which doubles the original frequencies. If the highest frequency in the original signal is at or above half the Nyquist frequency, then the pitch-shift algorithm will produce frequencies that will exceed Nyquist and so alias on playback. I think all you have to do is figure out the highest significant frequency in the signal and work out how much it will increase with pitch shifting. If the new frequency is higher than Nyquist, upsample so that it is at least a few kHz below Nyquist.

Most vocal lines and many instruments don't have much energy above 15 kHz, so pitch-shifting up a couple of semitones shouldn't be a problem at the base rate. The pitch-shift algorithm might work better at a higher rate regardless of the distance to Nyquist, but that's a different issue. However, pitch-shifting up a large amount, or shifting high-frequency content (like cymbal hits) will almost certainly push the new frequencies beyond Nyquist and so benefit from upsampling. Except for extreme effects, I doubt anything higher than 96k would be needed.
 

bogosort

Joined Sep 24, 2011
566
As I said earlier I am no expert. The only suggestion I can make is that you are resampling a signal that is already digitised. It is like taking a photocopy of a photocopy, you are going to lose data.
I think you're confusing upsampling (changing the sample rate) with resampling (ADC -> DAC -> ADC). There is no data loss in upsampling.
 

bogosort

Joined Sep 24, 2011
566
Thanks for your input. In one particularly bad case it was 2 semitones down, say from E to D. This turned out to be fixeable by first upsampling to 96000 Hz. I'm mostly wondering about the mechanism at work here.
Since you heard artifacts when down-shifting the signal, which was fixed by upsampling, it seems unlikely that aliasing is the problem. I don't have a good sense of how pitch-shift algorithms typically work (putting it on my TODO list), but I can imagine that the extra samples added by upsampling can help the algorithm, even if most of the extra samples are zeros.
 

s14rs4

Joined Sep 15, 2016
52
I think you're confusing upsampling (changing the sample rate) with resampling (ADC -> DAC -> ADC). There is no data loss in upsampling.
No, what I was referring to was taking a signal sampled at one rate, and then saving the same file at a higher rate.

I had a friend who wa an audiophile who I gave some recordings to, at 128k sample rate. He was very grateful but said they would have been better at 320k. So I saved the 128k file as 320k and he was happy. It was the same 128k file saved as 320k. Bigger file more losses.
 

bogosort

Joined Sep 24, 2011
566
No, what I was referring to was taking a signal sampled at one rate, and then saving the same file at a higher rate.

I had a friend who wa an audiophile who I gave some recordings to, at 128k sample rate. He was very grateful but said they would have been better at 320k. So I saved the 128k file as 320k and he was happy. It was the same 128k file saved as 320k. Bigger file more losses.
I'm not sure that we're talking about the same thing. A 128 kHz sample rate is very unusual for a recording -- standard rates are 44.1, 48, 96, and 192 kHz. Perhaps you're thinking of MP3 bitrate? That's an entirely different thing. In any case, one doesn't "save the file" at a higher rate. Upsampling is a process in which you insert zeros between the samples and then low-pass filter to the desired rate. File size has nothing to do with data loss.
 

s14rs4

Joined Sep 15, 2016
52
I apologise. In the first post the TS says he is using Audacity, my experience with that has been MP3 files including pitch shifting.

I should keep my mouth shut when I don't know what I talking about.
 

Thread Starter

hrs

Joined Jun 13, 2014
270
Since you heard artifacts when down-shifting the signal, which was fixed by upsampling, it seems unlikely that aliasing is the problem. I don't have a good sense of how pitch-shift algorithms typically work (putting it on my TODO list), but I can imagine that the extra samples added by upsampling can help the algorithm, even if most of the extra samples are zeros.
I was unaware of the zeros! I thought it would somehow interpolate and inject new values. The library used by Audacity says this:
The design was inspired by Laurent De Soras' paper `The Quest For The
Perfect Resampler', http://ldesoras.free.fr/doc/articles/resampler-en.pdf;
in essence, it combines Julius O. Smith's `Bandlimited Interpolation'
technique (https://ccrma.stanford.edu/~jos/resample/resample.pdf) with FFT-
based over-sampling.
[Edit]And it seems that digital resampling, upsampling and oversampling are used somewhat interchangeably. Confusing.[/edit]
The mental image I had is something like this (please ignore the lower time axis):
stretched.png
Now if the upsampled rate is close to a common multiple of the original and the stretched rates this might give close to perfect pitch shifting. But that's just guess work.

I apologise. In the first post the TS says he is using Audacity, my experience with that has been MP3 files including pitch shifting.

I should keep my mouth shut when I don't know what I talking about.
There is nothing to apologise for, I appreciate your suggestions. Indeed the original source is an mp3 most of the time, but the effect that I'm talking about is when shitfing a WAV to a WAV.
 
Last edited:

bogosort

Joined Sep 24, 2011
566
I was unaware of the zeros! I thought it would somehow interpolate and inject new values.
In sampling contexts, interpolation simply means low-pass filtering. Think of it this way: suppose we have a 1-second digital audio signal that was sampled at a rate of 48 kHz. In the digital domain there is no notion of time -- we have 48,000 samples, each of which is just an amplitude value. The amplitudes have an implicit phase relationship formed by the original signal and the original sampling rate, but the DAC (digital to analog converter) has no idea about any of this -- it just sees 48,000 individual sample values.

Suppose we set the DAC's sample rate to 96 kHz. At this rate, one second of audio equals 96,000 samples, and so with 48,000 samples the DAC will produce 0.5 seconds of audio. Since the implicit phase relationships between the amplitudes hasn't changed, the original frequencies are being played in half the time, and we will hear everything pitched up an octave.

Of course when we upsample a signal we only want to change the sample rate not the actual signal -- we want to leave the time and frequency scales of the original signal unchanged. So, to preserve the original signal's scaling, we have to introduce new samples into the signal. So, to upsample by a factor of N, insert N - 1 new samples after each original sample. This makes the effective sample rate N * Fs, where Fs is the original sample rate. And since these new samples represent only the extra bandwidth of the new rate (and not extra signal information), these new sample values can be zero.

However, just as in the original sampling process, the new upsampled signal will have spectral images of the original signal centered at integer multiples of the sample rate. To get rid of these images we need to low-pass flter the upsampled signal at the original Nyquist frequency. In the frequency domain, this is equivalent to multiplying by a rectangular function; in the time domain, this is equivalent to convolution with shifted sinc functions, i.e., interpolation. After filtering, we're left with a sequence of samples that represent the original signal at the new sample rate.

Here's a MATLAB demonstration using a signal composed of the sum of two cosines at 1 Hz and 3 Hz, originally sampled at 20 Hz, being upsampled to 60 Hz:

1608744010162.png

Time domain is on the left in blue, frequency domain on the right in red. You can see in the middle-left plot that the upsampled but as-yet interpolated signal is filled with zeros between the original samples. This causes the spectral images in the middle-right plot. On the bottom-left, after interpolation (LPF), the zero-samples have been mapped to the proper sample values at the new sample rate. This isn't interpolation in the ordinary sense of "guessing" what the values should be, rather it's the fundamental result of Shannon's sampling theorem: given a bandlimited signal, there is only one signal -- the original signal -- that passes through those points.

And it seems that digital resampling, upsampling and oversampling are used somewhat interchangeably. Confusing.
Yes, indeed. This is how I use the terms, and I believe it is the standard usage in the DSP literature. Upsampling (and downsampling) means changing the sample rate of a digital signal (i.e., one that has already been sampled). Resampling can be used as an umbrella term for these two purely digital processes, or it can refer to using another analog conversion pass to change the rate, i.e., a digital signal is sent to a DAC at the original rate and re-captured with an ADC at the new rate. I've also seen it used in context of pitch- and time-scale effects (where up/down sampling is combined with other processes).

Oversampling, in contrast, means sampling an analog signal (i.e., a signal that has not yet been sampled) using a rate far higher than the Nyquist frequency would require. The typical usage of oversampling, as in delta-sigma ADCs, is to trade excess bandwidth for increased resolution. It's a really cool topic, but not relevant to up/down sampling or pitch-shifting.
 
Last edited:

Thread Starter

hrs

Joined Jun 13, 2014
270
Wow, thank you for your time there, bogosort. It clears up some misconceptions that I had. DSP is a deep subject.
 
Top