Analog filter to digital.

Thread Starter

Christiaan Buijse

Joined Apr 4, 2019
19
Well the orignal question was about fixed point sort off. We've been going on a bit of a trip i suppose.

But the question about large coefficients came up, which i think i fixed by normalizing a10 and a20.
As for the coefficients i got, i think them to be right as the bode plot is equal to the analog transfer function.

Now then onto your transfer function we'd start by changing s into 2/T*(z-1/z+1) for this we'd need a sample rate in my case i'd be 20000.
We'd then have to simplify this.


 

MrAl

Joined Jun 17, 2014
13,709
Well the orignal question was about fixed point sort off. We've been going on a bit of a trip i suppose.

But the question about large coefficients came up, which i think i fixed by normalizing a10 and a20.
As for the coefficients i got, i think them to be right as the bode plot is equal to the analog transfer function.

Now then onto your transfer function we'd start by changing s into 2/T*(z-1/z+1) for this we'd need a sample rate in my case i'd be 20000.
We'd then have to simplify this.


Hi,

Ok looks good. I'll look up my previous notes on doing integer fixed point tomorrow and post some more details. Getting late here, unless i stay up a bit longer. I havent done this in a long while now but it is not that hard to do.
Seems i also remember doing this on the Arduino platform at one time too. The main idea is that everything stays an integer until you have to actually interface with the outside world, then you convert to a float as the last step. To get good resolution you need 64 bit ints, unless the lesser 32 bits is good enough and that brings us to the fact that the range vs precision is always a trade off.
There are also things written about the effects of finite register length, i'll really have to look that up that's from the very distant past for me :)
Will be interesting to review a little though as usual.
 

MrAl

Joined Jun 17, 2014
13,709
https://dsp.stackexchange.com/quest...-iir-filter-with-constant-coeffic/21800#21800

The code in there seems relevant and interesting as far as fixed point goes
Hi,

Here are some notes from my graphics processing DLL that i made a long time ago. These notes have not been modified but i will add stuff.

[START NOTE]
const i64 AugFactorH=65536;
const i64 AugFactor=65536; //vertical, see above AugFactorH
/*
AugFactorH (and AugFactor) is used to augment a type long by multiplying it by some factor like 256.
Once a number is multiplied by 256 it becomes more precise, although the storage range decreases.
This allows us to store numbers as type long but maintain them as 'augmented' numbers, which need to be
divided by 256 before using for anything in the real world. The extra precision using 256 is about 1/256.
This means we can store a number between say 1 and 2 without using type float or double. Say we wanted
to store the number 15/10 (which in floating point is 1.5). We cant store 1.5 in an integer type long,
so we multiply it by 256 in this exact order:
y=(15*256)/10
and so y=3840 instead of 1.5 as above.
What this means is that we are storing the number 3840 when we want to represent the float 1.5, and this
allows us to store floats like this without using a float type. When we want the actual number back,
we divide by 256, but there are usually many other operations in between that rely on the new precision.
Those numbers too might be represented the same way, but they may be just regular integers without
being augmented. It depends on the kind of operation being done. For addition and subtraction
both numbers need to be augmented, but for multiplication they dont unless both numbers are already
augmented and need the extra precision. In any case, the range has to be limited or else the type
long will overflow.
This of course works much better with type __int64, be cause that can hold 64 bits of precision
instead of just 32 like a type long. In cases where a 32 bit would overflow a 64 bit int has to
be used, and this program does just that in places where there is likely to be an overflow if the
file size in and out are too big or the ratio of the two sizes is high.
*/
[END NOTE]

So using the 256 multiplier pi would become 804. and 2 would become 512.
Now adding, we would get 804+512=1316, which when we go to use it in the real world would become 5.140625 when the float value would have been 5.14159265.
Using the 65536 multiplier pi becomes 205887, and 2 becomes 131072, and the sum is 336959, and dividing by 65536 we get float 5.1415863
so the accuracy has been improved.
Of course you must ensure that any operation such as multiplication does not cause an overflow during any calculation.
The multiplication of 2 times pi comes out to 26986020864, which is large so you must use a register bit size that can handle that.
That divided by 65536^2 is of course 6.283172607421878 and the more accurate float value is 6.283185307179587.
Note how for multiplication we had to divide by the square of the scale factor to get the final float value.
 

Thread Starter

Christiaan Buijse

Joined Apr 4, 2019
19
Hello,
I've made an attempt which i will have to try somewhere next week.

i've upscaled all coefficients by 2^14 and use 32bits for the math ill see how it goes.
 

MrAl

Joined Jun 17, 2014
13,709
Hello,
I've made an attempt which i will have to try somewhere next week.

i've upscaled all coefficients by 2^14 and use 32bits for the math ill see how it goes.

Hi,

Ok see what happens.

I think the largest number 'x' you can handle obeys:
x<2^((m-2*n)/2)

where
n is the power of 2 that makes up your scaling factor, and
m is the register bit length.

So for your power of 14 and your 32 bit register we end up with:
n=14
m=32

so the max number is:
x<4

so maybe 3.99 would be ok for example but 4.00 would not because it would cause overflow during a multiplication of 4.00*4.00 .

For my application the max 'x' was 256, so the max scale factor for 32 bit register was 256.
I dont remember why i went to a scale factor of 65536 when using a 64 bit integer type. Maybe that made dividing faster because that is 2^16 which is exactly 2 bytes so i could just read the upper bytes for the result.

Note if you need plus and minus you may have to decrease this value also.
 
Last edited:

Thread Starter

Christiaan Buijse

Joined Apr 4, 2019
19
Yes, i have values from -2 to 2 pretty much so i chose 2^14 and multiplying 2 16bits requires a 32 bit int as result thus using 32 bits ints for everything.
 
Top