Height of the heap

Thread Starter

AndrieGnd

Joined Jun 25, 2019
52
Hi guys, there's some confusions and hope to close that gap.

in the heap we have height maximum is (logn) , so if I want to calculate for example the minimum in max heap which is found on the leafs, we must go through the whole height which is logn and moreover we must also include the time that we spend on the leaf to check with minimum or not, but the professor said that the time to find minimum of the max heap is to go just on the height without including the time that we spend on the leaf itself (I mean the time for comparisons on the leaf itself with minimum .. we also do a work on the leaf so why we are not including it in calculation of time complexity?! ).. any help why we are not including the time we spend on the leaf itself for checking minimum or not?! thanks alot for your help guys.
 

WBahn

Joined Mar 31, 2012
30,045
If your heap is represented as a tree, then finding the minimum value in a maxheap requires you touch every node in the tree because it could be ANY leaf node and to examine all the leaves requires a full traversal of the tree. In fact, we'll traverse nearly every edge in the tree twice, once going down and once backtracking.

If your heap is represented as an array, then you can calculate the index of the first leaf node and simply walk down the array from that point. To do this you must examine approximately half the nodes in the heap.

Neither of these are log(n).

As for why you don't need to take into account the time required to do a comparison on each node (or each leaf node), that is because that is just a multiplying factor out front.

Remember that time complexity tell us how the time required to solve a problem grows as the problem size gets very large.

Imagine that you have been asked to provide a quote for inventorying a bunch of storerooms. You are told the square footage of each room, how many items there are per square foot. You know that you can inventory 20 items a minute, and also that each room has a door with a lock that takes one minute to open. If each storeroom is a broom closet (say 10 square feet) and there's only 1 item per square feet, then the time to open the door is a considerable fraction of the total time to inventory the room and you would include it in your quote. But what if each storeroom was a 10,000 square foot warehouse and there were 100 items per square foot. Would you bother adding in the one minute needed to open each door?

Consider the time calculation for some task as a function of the size of the problem:

T(n) = 1·n² + 1000·n + 1,000,000

As the problem grows larger and larger, the impact of the last two terms eventually gets overshadowed by the first term.

We would thus call this an O(n²) algorithm meaning that if we double the size of the problem, the amount of time required increases by approximately a factor of 4 provided we are talking about sufficiently large problem sizes.

See for yourself.

Using the full T(n) with n = 1, when we double it to n = 2:
T(2)/T(1) = 1,002,004 / 1,001,001 = 1.001 (doubling the size of the problem only increased the amount of time by one-tenth of one percent)
T(20)/T(10) = 1.117
T(200)/T(100) = 2.333
T(2,000)/T(1,000) = 3.793
T(20,000)/T(10,000) = 3.980
T(200,000)/T(100,000) = 3.998
T(2,000,000)/T(1,000,000) = 4.000
T(20,000,000)/T(10,000,000) = 4.000
T(200,000,000)/T(100,000,000) = 4.000
T(2,000,000,000)/T(1,000,000,000) = 4.000

Now do the same thing for O(T(n)) = n² and you get a factor of 4 regardless of the size of n.

So we can see that for problem sizes more than about a thousand, the growth is very close to what our n² time-complexity is. For smaller problems, this isn't a good match -- but it's not meant to be a good match for small problem sizes!

Even the leading coefficient of the dominant term goes away. Repeat the above example except make it 10·n² or 0.01·n² for the first term. The only thing that happens is the size of the problem that is "big enough" changes, but the bottom line is still the same: for sufficiently large problems the time needed to solve the problem goes up by a factor of 4 as the size of the problem goes up by a factor of 2.
 

Thread Starter

AndrieGnd

Joined Jun 25, 2019
52
If your heap is represented as a tree, then finding the minimum value in a maxheap requires you touch every node in the tree because it could be ANY leaf node and to examine all the leaves requires a full traversal of the tree. In fact, we'll traverse nearly every edge in the tree twice, once going down and once backtracking.

If your heap is represented as an array, then you can calculate the index of the first leaf node and simply walk down the array from that point. To do this you must examine approximately half the nodes in the heap.

Neither of these are log(n).

As for why you don't need to take into account the time required to do a comparison on each node (or each leaf node), that is because that is just a multiplying factor out front.

Remember that time complexity tell us how the time required to solve a problem grows as the problem size gets very large.

Imagine that you have been asked to provide a quote for inventorying a bunch of storerooms. You are told the square footage of each room, how many items there are per square foot. You know that you can inventory 20 items a minute, and also that each room has a door with a lock that takes one minute to open. If each storeroom is a broom closet (say 10 square feet) and there's only 1 item per square feet, then the time to open the door is a considerable fraction of the total time to inventory the room and you would include it in your quote. But what if each storeroom was a 10,000 square foot warehouse and there were 100 items per square foot. Would you bother adding in the one minute needed to open each door?

Consider the time calculation for some task as a function of the size of the problem:

T(n) = 1·n² + 1000·n + 1,000,000

As the problem grows larger and larger, the impact of the last two terms eventually gets overshadowed by the first term.

We would thus call this an O(n²) algorithm meaning that if we double the size of the problem, the amount of time required increases by approximately a factor of 4 provided we are talking about sufficiently large problem sizes.

See for yourself.

Using the full T(n) with n = 1, when we double it to n = 2:
T(2)/T(1) = 1,002,004 / 1,001,001 = 1.001 (doubling the size of the problem only increased the amount of time by one-tenth of one percent)
T(20)/T(10) = 1.117
T(200)/T(100) = 2.333
T(2,000)/T(1,000) = 3.793
T(20,000)/T(10,000) = 3.980
T(200,000)/T(100,000) = 3.998
T(2,000,000)/T(1,000,000) = 4.000
T(20,000,000)/T(10,000,000) = 4.000
T(200,000,000)/T(100,000,000) = 4.000
T(2,000,000,000)/T(1,000,000,000) = 4.000

Now do the same thing for O(T(n)) = n² and you get a factor of 4 regardless of the size of n.

So we can see that for problem sizes more than about a thousand, the growth is very close to what our n² time-complexity is. For smaller problems, this isn't a good match -- but it's not meant to be a good match for small problem sizes!

Even the leading coefficient of the dominant term goes away. Repeat the above example except make it 10·n² or 0.01·n² for the first term. The only thing that happens is the size of the problem that is "big enough" changes, but the bottom line is still the same: for sufficiently large problems the time needed to solve the problem goes up by a factor of 4 as the size of the problem goes up by a factor of 2.
Thanks for your meaningful answer ..
so if I understand you well, lets assume that we should include the time that we spend on the leaf itself, but it doesn't matter yeah? so that's why we are neglecting it? , I mean lets assume I walked whole the heap which means the path I took is log(n) edge, and at every node I was having two comparisons (at every node which means at leaf also .. ), so if I arrived the leaf we should include the time that we spend there but we can neglect it yeah? so that's why we say that the time complexity is path=(logn) *two comparison (I do two comparison at every node (at this case we neglect the two comparison at leaf because it doesn't matter on the time complexity..and if we arrived the leaf we doesn't move "implicity" more thing to worry about so it's like "halt" .. right?! )............ am I right in my conclusion? hope so :)) !
 

WBahn

Joined Mar 31, 2012
30,045
Thanks for your meaningful answer ..
so if I understand you well, lets assume that we should include the time that we spend on the leaf itself, but it doesn't matter yeah? so that's why we are neglecting it? , I mean lets assume I walked whole the heap which means the path I took is log(n) edge, and at every node I was having two comparisons (at every node which means at leaf also .. ), so if I arrived the leaf we should include the time that we spend there but we can neglect it yeah? so that's why we say that the time complexity is path=(logn) *two comparison (I do two comparison at every node (at this case we neglect the two comparison at leaf because it doesn't matter on the time complexity..and if we arrived the leaf we doesn't move "implicity" more thing to worry about so it's like "halt" .. right?! )............ am I right in my conclusion? hope so :)) !
I think you are getting some of it, but I think there are still holes.

Keep in mind what time-complexity is and is not. It is NOT something that tells us how long an algorithm will take to execute, not even as an estimate. It is an upper bound on how fast the time to solve a problem grows as the problem size gets larger and larger. In nearly all cases if we were to do an exact, detailed analysis and come up with an exact equation that tells use exactly how long a particular implementation of an algorithm will take to execute we would discover than there is a single term that dominates the result as the problem size gets large. Why do the detailed analysis and spend a lot of time in the weeds for all of those other terms that, for large problem sizes, have virtually not effect? So learn how to identify the dominant term and ignore the rest.

Apart from that, the example you are using is finding the minimum value in a max heap data structure. Assuming we are talking about a tree implementation (and not an array-based one) the path we take to find the minimum value is not log(n). That only gets us from the root to ONE of the leaf nodes. We have to get to ALL of them since ANY one of them could be the minimum. So we need to traverse the tree, which for almost all of the edges means going down the edge to the child and then later coming back up that edge to continue the traversal. Thus the number of edges is equal to the number of nodes (minus one), the number of edges traversed is roughly equal to twice the number of nodes (we only have to go down the right edge of the heap since we don't have to go back up after visiting the right-most leaf. The computations we have to do to navigate the three is essentially proportional to the number of nodes we visit -- each node we visit we are going to have to check it if has a left child and, if so, push down to the child, then we will have to check if it has a right child and, if so, push down to that, then we will have to push up to the parent.

Remember, a heap is NOT a binary search tree. Finding the minimum in a binary search tree DOES involve a single traversal down the tree that is only as long as the height of the tree. But that's because a binary search tree is fully ordered and so at each step we KNOW which child to take to find the value we seek. A heap is only partially ordered -- that makes some tasks faster but makes other tasks slower -- as with nearly everything in engineering, it's a comprise between competing goals. For instance, finding the max value is constant time -- it doesn't matter how big the heap is, the max value is at the root. One of the penalties is that finding the min value takes much longer. So we use a max heap when usually we are only interested in the max value and seldom interested in the min (or in finding any particular value other than the max, as far as that goes).
 

Thread Starter

AndrieGnd

Joined Jun 25, 2019
52
I understand you very well, but I think you're not getting my point .
what's confusing me is the time of the leaf that we are almost not taking it in any calculations we want to calculate, for example if we are passing from root to leaf and at every edge/level we do "some work" , at the leaf we are not considering that work we are neglecting it .. so neglecting it because it wouldn't affect the time complexity yeah? but we should take it and in the same time neglected because we are dealing asymptotically with time complexity, I mean C*LOGn on the edges + time work on the leaf = C*LOGn because the time work on leaf is approximately neglectable according to the time complexity through the edge .... right?!
 

Thread Starter

AndrieGnd

Joined Jun 25, 2019
52
Hi guys, if I was having maximum heap with value *numbers* are different , and I have done a delete maximum of the first node(root) by delete max procedure that's taking the most right leaf and exchange it by the root(maximum).

my question, is it necessary after doing once a procedure of delete max on the max heap I will need to do "heapify down" ? I mean is there a possibility that's the leaf that we put it on the root is setting in its position and following the heap property without doing heapify down to the new root?! **the numbers are different of nodes's values***

thanks alot
 
Top