Why don't these approximations seem to converge?

Thread Starter

WBahn

Joined Mar 31, 2012
29,932
Say I want to solve the following equation for k

\(
\( 1 \; - \; \frac{1}{D} \)^k \; = \; p
\)

for values of p near 0.5.

The exact solution is

\(
k \; = \; \frac{\ln{ \( p \)}}{\ln{ \(1 \; - \; \frac{1}{D} \)}}
\)

But if D is large, then calculating the ln() in the denominator is problematic.

So I can think of two approximations that should be valid when D is very large (as in something like 2^256).

The first is the approximation that

\(
\ln{ \( 1 \; + \; x \) } \; \approx \; x
\)

Applying this to the denominator of the solution, we get

\(
k \; = \; -D \ln{ \( p \)} \; = \; D \ln{ \( \frac{1}{p}\) }
\)

Another approximation is that

\(
\( 1 \; + \; x\)^k \; \approx \; 1 \; + \; kx
\)

Applying this to the original expression, we get

\(
k \; = \; D \( 1 - p \)
\)

If we set p = 0.5, then the first approximation yields

\(
k \; = \; D \ln{\( 2 \)} \; = \; 0.693 D
\)

While the second yields

\(
k \; = \; 0.5 D
\)

For several reasons, I have more confidence in the first approximation. But I would expect both approximations to converge as 1/D goes to zero (meaning that the ratio of the two approximations approaches unity), yet clearly they don't.

Any insights would be appreciated.
 

wayneh

Joined Sep 9, 2010
17,495
Something is wrong with the second approximation. It doesn't provide a good approximation and it doesn't get any closer as D increases. I'm familiar with the first one, which works really well in this case, but not with the second, which doesn't seem to work. Maybe double-check the context of wherever it came from.

Screen Shot 2017-07-08 at 5.41.59 PM.png
 

Thread Starter

WBahn

Joined Mar 31, 2012
29,932
The second one is just the Binomial approximation

https://en.wikipedia.org/wiki/Binomial_approximation

I've used it many times over the years and have never had a (noticeable) problem.

If I look at the ratio of

\(
\frac{1 \; + \; \frac{k}{D}}{ \( 1 \; + \; \frac{1}{D}\)^k}
\)

for k = 10, then with D=10 the ratio is 77% but for D=1000 the ratio is unity to nearly five sig figs and for D=100000 it's good out to 9. In fact, every time I increase D by a factor of ten, the two match to an additional two sig figs, which is what I would expect since the first term left off in the approximation is the quadratic term.
 

Thread Starter

WBahn

Joined Mar 31, 2012
29,932
It must be something in the setup we’re both missing. Maybe write out the intermediate steps.
There's not much to those. Here's all there is for the second approximation (I'll leave out the details for the first since I think we are both in agreement that it yields the proper results).

\(
\( 1 \; - \; \frac{1}{D} \)^k \; = \; p
\)

Using the Binomial Approximation

\(
\( 1 \; + \; x\)^k \; \approx \; 1 \; + \; kx
\)

\(
\( 1 \; + \; \(- \frac{1}{D} \) \)^k \; \approx \; 1 \: + \; k\( -\frac{1}{D} \) \; = \; 1 \; - \; \frac{k}{D}
\)

thus

\(
1 \; - \; \frac{k}{D} \; = p
\)

Solving for k yields

\(
k \; = \; D \( 1 - p \)
\)

If we set p = 0.5, then the result is

\(
k \; = \; 0.5 D
\)
 

Thread Starter

WBahn

Joined Mar 31, 2012
29,932
I think the problem is that there is some subtlety that makes using this to solve for the exponent invalid. I can live with that, but only if I can understand what that subtlety is and why it makes doing this invalid so that I understand the bounds of what I can and can't do.
 

wayneh

Joined Sep 9, 2010
17,495
I think the problem is that there is some subtlety that makes using this to solve for the exponent invalid. I can live with that, but only if I can understand what that subtlety is and why it makes doing this invalid so that I understand the bounds of what I can and can't do.
It ought to work! Does it get better if you include the second (quadratic) term?
 

Thread Starter

WBahn

Joined Mar 31, 2012
29,932
It ought to work! Does it get better if you include the second (quadratic) term?
Nope. If anything, it's worse.

But I figured it out.

My earlier tests just used a fixed value of k=10 and then let D grow and showed that as D grew the approximation became better and better. But it didn't match well for D=10 and I just wrote that off to D being to small. But at D=100 it was getting decent (within 1%), so I was confused since the numbers I am using have D = 2^256, which is getting close to the ratio of 1 atom to the number of atoms in the known universe. But when I changed k to 1000 I immediately noticed that the agreement for D = 1000 now sucked and it started getting better at D=10000. A couple more trials and I became convinces that not only does 1/D have to be small, but k/D had to be small as well.

So I decided to read the Wiki looking specifically for this and, sure enough, it states right up front for those who, unlike me, are not too lazy to pay attention.

The approximation has TWO conditions. The first one is the one that I've always focused and relied on.

\(
\( 1 \; + \; x\)^n \; \approx. \; 1 \; + \; nx
\)

I've always "known" that the requirement is for |x| << 1.

I turns out, this isn't even correct -- it only requires that |x| < 1.

So x can actually be arbitrarily close to 1 and the approximation still holds.

The second condition is that |nx| << 1.

So that requires that the product of the exponent and x be much smaller than 1.

That is where my use fails. The necessary value of n makes nx (or k/D) on the order of 0.5, thus violating the requirement.

Having now discovered this, it is ringing a bell and I suspect that I did run into this before and finally found this same little nugget, but in the intervening years have forgotten it.

At least now I understand why my two results weren't converging and there is order in the universe once again.
 

MrAl

Joined Jun 17, 2014
11,342
Hi there,

The first thing i noticed about the first post was that there was no range given for any of the approximations. When we look in good math reference books we almost always see a range of values over which the approximation is valid and that is because many approximations only work in a certain interval and when we go outside that range we get bad values or values that do not fit the required accuracy. Even if the range is infinite many times we'll see -inf<x<inf or something like that.

When we come up with our own approximations (or curve fitting) that means we have to figure out the range of validity. If we dont do that, we wont know if we are getting a good result or not for all possible values for the parameters we might stick in there some fine day. So one of the rules of approximations is we have to specify the range of validity, and sometime it's wise to specify -inf<x<inf so we know we went through that process at one time and found it to be accurate over the whole range of values of the parameter.

In the process of this investigation, we may find that it is not suitable for a given application so we either have to find a new approximation or divide the range of parameters up into intervals where we have a different approximation for each interval. Obviously the fewer number of intervals the better.
 

wayneh

Joined Sep 9, 2010
17,495
At least now I understand why my two results weren't converging and there is order in the universe once again.
Whew! That was close.

Here's the table I posted earlier, this time with P=0.99 instead of 0.5 in the original. It's all good now.

Screen Shot 2017-07-10 at 12.32.09 PM.png
 

Papabravo

Joined Feb 24, 2006
21,094
That's almost the firs thing they mention when they describe a Taylor(Maclaurin) Series.
  1. The approximation is valid in some neighborhood of a single point
  2. If you want a better approximation you need more terms, or you need to expand about a different point.
  3. For a truncated Taylor Series you can estimate the magnitude of the error
Glad you resolved the dilema.
 

Thread Starter

WBahn

Joined Mar 31, 2012
29,932
Hi there,

The first thing i noticed about the first post was that there was no range given for any of the approximations.
I stated that the approximation was for large D, meaning that 1/D is small.

Even in math textbooks this is often the kind of qualitative description you see. When written mathematically, it is usually something like |x| << 1. It is almost never defined what constitutes "much smaller than".
 

Thread Starter

WBahn

Joined Mar 31, 2012
29,932
That's almost the firs thing they mention when they describe a Taylor(Maclaurin) Series.
  1. The approximation is valid in some neighborhood of a single point
  2. If you want a better approximation you need more terms, or you need to expand about a different point.
  3. For a truncated Taylor Series you can estimate the magnitude of the error
Glad you resolved the dilema.
And I would have said that 1 - 2^-256 is well within some "small neighborhood" of 1. The next order term is on the order of 2^-512.
 

MrAl

Joined Jun 17, 2014
11,342
I stated that the approximation was for large D, meaning that 1/D is small.

Even in math textbooks this is often the kind of qualitative description you see. When written mathematically, it is usually something like |x| << 1. It is almost never defined what constitutes "much smaller than".
Hi,

Sorry, that's not what i meant. However, my book states |x|<1.

What i meant was that when we come up with an approximation it is up to us to do the work. What this means is that we are not asking for a solution with a particular value (like D for example), we are actually exploring and stating what works and what does not work. Like perhaps up to D=1e10 for example the original function works.
For another more comprehensive example, in the original function we can state a D for a certain number of digits. For example (just random for now) with D<1e10 we get up to 16 digits and require either large integers or large floats. For D<1e32 we need 32 digit floats, etc. these are random but do address the real issue in a real way so D would go with the number of digits required to use the original formula. If we had infinite resolution we could use any D even 1e999, but using 1e999 could still be done with less than infinite resolution, and it would be our job to discover what would be required of the number cruncher. This pertains to the denominator log(1-1/D) where D gets large.

I realize this may be going beyond what you wanted to do though, so it's just an example really.

Interestingly, the one of the approximations did not show up in my math reference book :)
 

Thread Starter

WBahn

Joined Mar 31, 2012
29,932
Hi,

Sorry, that's not what i meant. However, my book states |x|<1.
But 1/D with D = 2^256 most definitely satisfies that requirement. Yet is doesn't work.

The reason is that the requirement is not ONLY that |x| < 1, but ALSO that |nx| << 1. It's the SECOND constraint that was being violated, not the first. If your book doesn't provide the second constraint, then get another book. I imagine that it does provide it, however, and that (like me) you just weren't reading carefully enough.
 

MrAl

Joined Jun 17, 2014
11,342
But 1/D with D = 2^256 most definitely satisfies that requirement. Yet is doesn't work.

The reason is that the requirement is not ONLY that |x| < 1, but ALSO that |nx| << 1. It's the SECOND constraint that was being violated, not the first. If your book doesn't provide the second constraint, then get another book. I imagine that it does provide it, however, and that (like me) you just weren't reading carefully enough.
Hi,

Are you talking about the log(1-1/D) with D=2^256 still? You're saying it doesnt work, and that baffles me a little. The only thing that "does not work" (if i understand you right) is D-->inf. So roughly speaking, anything less than infinity works, you just cant restrict your resolution as much as you must be doing.

The book is very clear, but perhaps i did not make it clear what ti does not contain. It does not contain the approximation 1+k*x. Maybe elsewhere in the book? I'd have to check the rest too but these books dont contain every approximation under the sun because there are too many. It does have a section on numerical methods though, but there are entire books on numerical methods so it only has the most common ones like Newton's, Lagrange, etc. It's a nice section but cant contain everything.
It's been a while now since i checked other math references. Havent been to the library in a while now either :)

Taylor series' sometimes work nicely too.
 

Thread Starter

WBahn

Joined Mar 31, 2012
29,932
Hi,

Are you talking about the log(1-1/D) with D=2^256 still? You're saying it doesnt work, and that baffles me a little. The only thing that "does not work" (if i understand you right) is D-->inf. So roughly speaking, anything less than infinity works, you just cant restrict your resolution as much as you must be doing.
I'm talking about the Binomial Approximation -- it's an approximation that is so widely used that it actually has a name.

\(
\( 1 \; + \; x\)^n \; \approx \; 1 \; + \; nx
\)

This works perfectly as x goes to zero (or with x = -1/D as D goes to infinity) as long as n is finite. Both exact expression and the approximation goto 1 in the limit, just as you would hope.

The problem is that the range of applicability not only places limits on the value of x, namely

\(
\| x \| \; \lt \; 1
\)

but also on the value of n in relation to x, namely

\(
\| nx \| \ll 1
\)

or equivalently

\(
\| n \| \; \ll \; \|\frac{1}{x} \|
\)

Just having x be small is simply not good enough. The example in the original post demonstrates this.

In trying to find the value of n such that (1+x) is equal to 1/2, the approximation does NOT converge on the correct answer as x (which is negative) gets smaller and smaller in magnitude. The reason is that the value of n needs to be on the order of the value of 1/x, violating the second condition. It doesn't matter how small 1/x gets, the value of n that results is too small by about 28%.
 

MrAl

Joined Jun 17, 2014
11,342
Hi again,

Oh ok, that makes sense.

I am not sure if you are interested in this kind of solution or not, but here's another idea.

If we change the form of the ORIGINAL solution a little we get this:
k=-(log(p))/log(D/(D-1.0))

where i am making it more clear that in the denominator of the denominator the number 1.0 is a floating point number. Thus the whole denominator of the denominator is a float: d2= D-1.0 and this helps to see what is happening i think. Inside the denominator log, we have D/(D-1.0) and that is also a float.

Now when D becomes large, eventually we reach a float, which if incremented by 1.0, results in the SAME representation in float format, so when we subtract D-1.0 we get EXACTLY D back again. This of course leads to log(D/D) which of course triggers a number cruncher error.

To find out when this happens, we can estimate the dx to be 'around' 1e-16, so that would be D roughly around 1e16. But what would happen if we went to a better resolution number cruncher. If we had a resolution of 1e-32 for example (much better resolution), then the pseudo zero differential would be 1e-32, pushing D up to 1e32, up quite a bit from 1e16. Similarly, if we go to a resolution of 1e-64 we push D up to 1e64. So we got up pretty much higher by increasing the resolution. If we went up one more time, to a resolution of 1e-128, we would exceed that needed for D=2^256.

I realize not everyone likes this kind of solution though. It requires a better number cruncher or just do a little programming to provide floats that go up higher in resolution by using the innate resolution. The result is a higher resolution number cruncher that can be used for various purposes such as to test results from lower resolution number crunchers. Yes the processing is slower, but in many cases it's still so fast you dont even know it happened at all when the formula is small like this one.

I've found this kind of solution the only solution in some cases. An example was when trying to find the smallest circle that will encircle a number of equal smaller circles and get a result with very high precision.

Also, by setting the limits on the parameters for this example where we went to a resolution of about 1e-128 we would then have to specify that D has to be less than 1e128 (or whatever it actually was as that is just a quick estimate).
Interestingly, D must also be greater than 1 unless we can accept a complex number as a solution like log(3)+pi*i :)
 
Last edited:

Thread Starter

WBahn

Joined Mar 31, 2012
29,932
Hi again,

Oh ok, that makes sense.

I am not sure if you are interested in this kind of solution or not, but here's another idea.

If we change the form of the ORIGINAL solution a little we get this:
k=-(log(p))/log(D/(D-1.0))

where i am making it more clear that in the denominator of the denominator the number 1.0 is a floating point number. Thus the whole denominator of the denominator is a float: d2= D-1.0 and this helps to see what is happening i think. Inside the denominator log, we have D/(D-1.0) and that is also a float.

Now when D becomes large, eventually we reach a float, which if incremented by 1.0, results in the SAME representation in float format, so when we subtract D-1.0 we get EXACTLY D back again. This of course leads to log(D/D) which of course triggers a number cruncher error.

To find out when this happens, we can estimate the dx to be 'around' 1e-16, so that would be D roughly around 1e16. But what would happen if we went to a better resolution number cruncher. If we had a resolution of 1e-32 for example (much better resolution), then the pseudo zero differential would be 1e-32, pushing D up to 1e32, up quite a bit from 1e16. Similarly, if we go to a resolution of 1e-64 we push D up to 1e64. So we got up pretty much higher by increasing the resolution. If we went up one more time, to a resolution of 1e-128, we would exceed that needed for D=2^256.

I realize not everyone likes this kind of solution though. It requires a better number cruncher or just do a little programming to provide floats that go up higher in resolution by using the innate resolution. The result is a higher resolution number cruncher that can be used for various purposes such as to test results from lower resolution number crunchers. Yes the processing is slower, but in many cases it's still so fast you dont even know it happened at all when the formula is small like this one.

I've found this kind of solution the only solution in some cases. An example was when trying to find the smallest circle that will encircle a number of equal smaller circles and get a result with very high precision.

Also, by setting the limits on the parameters for this example where we went to a resolution of about 1e-128 we would then have to specify that D has to be less than 1e128 (or whatever it actually was as that is just a quick estimate).
Interestingly, D must also be greater than 1 unless we can accept a complex number as a solution like log(3)+pi*i :)
Since we would need a mantissa that had 256 bits of resolution just to get 1 bit of resolution in our answer, we would need another 23 bits just to get to the same resolution in the result that we get from a 32-bit IEEE single-precision floating point representation (about 7 sig figs). Then we need bits for the exponent, but that would only require another 10 bits or so. And then we need a sign bit. Hence we need a floating point representation that about 300 bits. Since floating point representations generally increase by factors of two, that would be take us to a 512 bit representation.

And that's just for a 256-bit hash function. There are 512-bit and 1024-bit hash functions out there. In fact, there's at least one 2048-bit hash function out there, which would require a 4096-bit floating point representation to do it this way.

But I can get the result to seven sig figs by hand in a few minutes using a decent log table.
 

MrAl

Joined Jun 17, 2014
11,342
Since we would need a mantissa that had 256 bits of resolution just to get 1 bit of resolution in our answer, we would need another 23 bits just to get to the same resolution in the result that we get from a 32-bit IEEE single-precision floating point representation (about 7 sig figs). Then we need bits for the exponent, but that would only require another 10 bits or so. And then we need a sign bit. Hence we need a floating point representation that about 300 bits. Since floating point representations generally increase by factors of two, that would be take us to a 512 bit representation.

And that's just for a 256-bit hash function. There are 512-bit and 1024-bit hash functions out there. In fact, there's at least one 2048-bit hash function out there, which would require a 4096-bit floating point representation to do it this way.

But I can get the result to seven sig figs by hand in a few minutes using a decent log table.

Hi again,

Well, if you need 1000 bits then you need 1000 bits, but inside computer land this is nothing and requires little code to achieve. The reason for using a computer is to reduce work needed to be done by hand, so it makes it economical anyway. I have used resolutions of ever 2000 decimal places, so what is that, maybe 3000 plus bits, but even in BCD that's 8000 bits which who cares...it's a machine doing the work, and it's still fast for small formulas. 8000 bits is a mere 1k of storage space on computers that today have gigabytes of memory and hundreds of gigabytes of hard drive space.

If you prefer a table then you can either scan it into the computer or enter it, or better yet find an online reference table and store that, then read it in with a program (which i am darn sure you know how to write) and go from there, either with interpolation or full blow curve fit.

Just some suggestions which i base on some 40 years with hand calculators and PC computers and simple and complex engineering formulas and tasks.
One interesting case i ran into years ago was a very nice formula for inductance but it was in the form of a graph, not a table of data like you are fortune enough to have. After curve fitting the graph i was able to obtain a formula which contains maybe 4 different constants and some multiplication and division. So it went from havnig to constantly be retrieving the book and opening the book to look at that graph to just double clicking an icon :)

I also realize some people just like doing it a certain way, so i just make suggestions that might help.
 
Top