Height of the heap

AndrieGnd · Jul 2, 2019

Hi guys, there's some confusions and hope to close that gap.

in the heap we have height maximum is (logn) , so if I want to calculate for example the minimum in max heap which is found on the leafs, we must go through the whole height which is logn and moreover we must also include the time that we spend on the leaf to check with minimum or not, but the professor said that the time to find minimum of the max heap is to go just on the height without including the time that we spend on the leaf itself (I mean the time for comparisons on the leaf itself with minimum .. we also do a work on the leaf so why we are not including it in calculation of time complexity?! ).. any help why we are not including the time we spend on the leaf itself for checking minimum or not?! thanks alot for your help guys.

WBahn · Jul 2, 2019

If your heap is represented as a tree, then finding the minimum value in a maxheap requires you touch every node in the tree because it could be ANY leaf node and to examine all the leaves requires a full traversal of the tree. In fact, we'll traverse nearly every edge in the tree twice, once going down and once backtracking.

If your heap is represented as an array, then you can calculate the index of the first leaf node and simply walk down the array from that point. To do this you must examine approximately half the nodes in the heap.

Neither of these are log(n).

As for why you don't need to take into account the time required to do a comparison on each node (or each leaf node), that is because that is just a multiplying factor out front.

Remember that time complexity tell us how the time required to solve a problem grows as the problem size gets very large.

Imagine that you have been asked to provide a quote for inventorying a bunch of storerooms. You are told the square footage of each room, how many items there are per square foot. You know that you can inventory 20 items a minute, and also that each room has a door with a lock that takes one minute to open. If each storeroom is a broom closet (say 10 square feet) and there's only 1 item per square feet, then the time to open the door is a considerable fraction of the total time to inventory the room and you would include it in your quote. But what if each storeroom was a 10,000 square foot warehouse and there were 100 items per square foot. Would you bother adding in the one minute needed to open each door?

Consider the time calculation for some task as a function of the size of the problem:

T(n) = 1·n² + 1000·n + 1,000,000

As the problem grows larger and larger, the impact of the last two terms eventually gets overshadowed by the first term.

We would thus call this an O(n²) algorithm meaning that if we double the size of the problem, the amount of time required increases by approximately a factor of 4 provided we are talking about sufficiently large problem sizes.

See for yourself.

Using the full T(n) with n = 1, when we double it to n = 2:
T(2)/T(1) = 1,002,004 / 1,001,001 = 1.001 (doubling the size of the problem only increased the amount of time by one-tenth of one percent)
T(20)/T(10) = 1.117
T(200)/T(100) = 2.333
T(2,000)/T(1,000) = 3.793
T(20,000)/T(10,000) = 3.980
T(200,000)/T(100,000) = 3.998
T(2,000,000)/T(1,000,000) = 4.000
T(20,000,000)/T(10,000,000) = 4.000
T(200,000,000)/T(100,000,000) = 4.000
T(2,000,000,000)/T(1,000,000,000) = 4.000

Now do the same thing for O(T(n)) = n² and you get a factor of 4 regardless of the size of n.

So we can see that for problem sizes more than about a thousand, the growth is very close to what our n² time-complexity is. For smaller problems, this isn't a good match -- but it's not meant to be a good match for small problem sizes!

Even the leading coefficient of the dominant term goes away. Repeat the above example except make it 10·n² or 0.01·n² for the first term. The only thing that happens is the size of the problem that is "big enough" changes, but the bottom line is still the same: for sufficiently large problems the time needed to solve the problem goes up by a factor of 4 as the size of the problem goes up by a factor of 2.

AndrieGnd · Jul 3, 2019

WBahn said:
If your heap is represented as a tree, then finding the minimum value in a maxheap requires you touch every node in the tree because it could be ANY leaf node and to examine all the leaves requires a full traversal of the tree. In fact, we'll traverse nearly every edge in the tree twice, once going down and once backtracking.

If your heap is represented as an array, then you can calculate the index of the first leaf node and simply walk down the array from that point. To do this you must examine approximately half the nodes in the heap.

Neither of these are log(n).

As for why you don't need to take into account the time required to do a comparison on each node (or each leaf node), that is because that is just a multiplying factor out front.

Remember that time complexity tell us how the time required to solve a problem grows as the problem size gets very large.

Imagine that you have been asked to provide a quote for inventorying a bunch of storerooms. You are told the square footage of each room, how many items there are per square foot. You know that you can inventory 20 items a minute, and also that each room has a door with a lock that takes one minute to open. If each storeroom is a broom closet (say 10 square feet) and there's only 1 item per square feet, then the time to open the door is a considerable fraction of the total time to inventory the room and you would include it in your quote. But what if each storeroom was a 10,000 square foot warehouse and there were 100 items per square foot. Would you bother adding in the one minute needed to open each door?

Consider the time calculation for some task as a function of the size of the problem:

T(n) = 1·n² + 1000·n + 1,000,000

As the problem grows larger and larger, the impact of the last two terms eventually gets overshadowed by the first term.

We would thus call this an O(n²) algorithm meaning that if we double the size of the problem, the amount of time required increases by approximately a factor of 4 provided we are talking about sufficiently large problem sizes.

See for yourself.

Using the full T(n) with n = 1, when we double it to n = 2:
T(2)/T(1) = 1,002,004 / 1,001,001 = 1.001 (doubling the size of the problem only increased the amount of time by one-tenth of one percent)
T(20)/T(10) = 1.117
T(200)/T(100) = 2.333
T(2,000)/T(1,000) = 3.793
T(20,000)/T(10,000) = 3.980
T(200,000)/T(100,000) = 3.998
T(2,000,000)/T(1,000,000) = 4.000
T(20,000,000)/T(10,000,000) = 4.000
T(200,000,000)/T(100,000,000) = 4.000
T(2,000,000,000)/T(1,000,000,000) = 4.000

Now do the same thing for O(T(n)) = n² and you get a factor of 4 regardless of the size of n.

So we can see that for problem sizes more than about a thousand, the growth is very close to what our n² time-complexity is. For smaller problems, this isn't a good match -- but it's not meant to be a good match for small problem sizes!

Even the leading coefficient of the dominant term goes away. Repeat the above example except make it 10·n² or 0.01·n² for the first term. The only thing that happens is the size of the problem that is "big enough" changes, but the bottom line is still the same: for sufficiently large problems the time needed to solve the problem goes up by a factor of 4 as the size of the problem goes up by a factor of 2.

Thanks for your meaningful answer ..
so if I understand you well, lets assume that we should include the time that we spend on the leaf itself, but it doesn't matter yeah? so that's why we are neglecting it? , I mean lets assume I walked whole the heap which means the path I took is log(n) edge, and at every node I was having two comparisons (at every node which means at leaf also .. ), so if I arrived the leaf we should include the time that we spend there but we can neglect it yeah? so that's why we say that the time complexity is path=(logn) *two comparison (I do two comparison at every node (at this case we neglect the two comparison at leaf because it doesn't matter on the time complexity..and if we arrived the leaf we doesn't move "implicity" more thing to worry about so it's like "halt" .. right?! )............ am I right in my conclusion? hope so

) !

WBahn · Jul 3, 2019

AndrieGnd said:
Thanks for your meaningful answer ..
so if I understand you well, lets assume that we should include the time that we spend on the leaf itself, but it doesn't matter yeah? so that's why we are neglecting it? , I mean lets assume I walked whole the heap which means the path I took is log(n) edge, and at every node I was having two comparisons (at every node which means at leaf also .. ), so if I arrived the leaf we should include the time that we spend there but we can neglect it yeah? so that's why we say that the time complexity is path=(logn) *two comparison (I do two comparison at every node (at this case we neglect the two comparison at leaf because it doesn't matter on the time complexity..and if we arrived the leaf we doesn't move "implicity" more thing to worry about so it's like "halt" .. right?! )............ am I right in my conclusion? hope so ) !

I think you are getting some of it, but I think there are still holes.

Keep in mind what time-complexity is and is not. It is NOT something that tells us how long an algorithm will take to execute, not even as an estimate. It is an upper bound on how fast the time to solve a problem grows as the problem size gets larger and larger. In nearly all cases if we were to do an exact, detailed analysis and come up with an exact equation that tells use exactly how long a particular implementation of an algorithm will take to execute we would discover than there is a single term that dominates the result as the problem size gets large. Why do the detailed analysis and spend a lot of time in the weeds for all of those other terms that, for large problem sizes, have virtually not effect? So learn how to identify the dominant term and ignore the rest.

Apart from that, the example you are using is finding the minimum value in a max heap data structure. Assuming we are talking about a tree implementation (and not an array-based one) the path we take to find the minimum value is not log(n). That only gets us from the root to ONE of the leaf nodes. We have to get to ALL of them since ANY one of them could be the minimum. So we need to traverse the tree, which for almost all of the edges means going down the edge to the child and then later coming back up that edge to continue the traversal. Thus the number of edges is equal to the number of nodes (minus one), the number of edges traversed is roughly equal to twice the number of nodes (we only have to go down the right edge of the heap since we don't have to go back up after visiting the right-most leaf. The computations we have to do to navigate the three is essentially proportional to the number of nodes we visit -- each node we visit we are going to have to check it if has a left child and, if so, push down to the child, then we will have to check if it has a right child and, if so, push down to that, then we will have to push up to the parent.

Remember, a heap is NOT a binary search tree. Finding the minimum in a binary search tree DOES involve a single traversal down the tree that is only as long as the height of the tree. But that's because a binary search tree is fully ordered and so at each step we KNOW which child to take to find the value we seek. A heap is only partially ordered -- that makes some tasks faster but makes other tasks slower -- as with nearly everything in engineering, it's a comprise between competing goals. For instance, finding the max value is constant time -- it doesn't matter how big the heap is, the max value is at the root. One of the penalties is that finding the min value takes much longer. So we use a max heap when usually we are only interested in the max value and seldom interested in the min (or in finding any particular value other than the max, as far as that goes).

AndrieGnd · Jul 3, 2019

I understand you very well, but I think you're not getting my point .
what's confusing me is the time of the leaf that we are almost not taking it in any calculations we want to calculate, for example if we are passing from root to leaf and at every edge/level we do "some work" , at the leaf we are not considering that work we are neglecting it .. so neglecting it because it wouldn't affect the time complexity yeah? but we should take it and in the same time neglected because we are dealing asymptotically with time complexity, I mean C*LOGn on the edges + time work on the leaf = C*LOGn because the time work on leaf is approximately neglectable according to the time complexity through the edge .... right?!

AndrieGnd · Jul 3, 2019

Hi guys, if I was having maximum heap with value *numbers* are different , and I have done a delete maximum of the first node(root) by delete max procedure that's taking the most right leaf and exchange it by the root(maximum).

my question, is it necessary after doing once a procedure of delete max on the max heap I will need to do "heapify down" ? I mean is there a possibility that's the leaf that we put it on the root is setting in its position and following the heap property without doing heapify down to the new root?! **the numbers are different of nodes's values***

thanks alot

Thread starter	Similar threads	Forum	Replies	Date
M	Sanibroyeur (macerator) at lower height than toilet	Sensor Design & Implementation	8	Apr 13, 2026
	Height Misalignment problem between fine-pitch ICs and the etched copper traces of that PCB.	PCB Layout , EDA & Simulations	15	May 31, 2024
	Low cost height sensor Lidar or ToF for detecting box heights on a conveyor	Automation, Robotics & Control	2	Feb 8, 2022
	HOW is the height of a continuous impulse train 1?	General Science, Physics & Math	7	Nov 12, 2021
	Metal detectors sweeping height...	General Electronics Chat	2	Aug 26, 2021

Height of the heap

Join our Engineering Community! Sign-in with:

Height of the heap

AndrieGnd

WBahn

AndrieGnd

WBahn

AndrieGnd

AndrieGnd

You May Also Like

The Kilo Lamp: An Interactive Lamp Controller

AI Pipeline Storage Demands Optimal Performance, Efficiency

Altera Intros Upgrade of FPGA AI Suite Enabling Determinism for Physical AI

EPC Debuts Ultra-Compact GaN Motor Drive Reference Design