Search a Binary Search Tree

Thread Starter

zulfi100

Joined Jun 7, 2012
656
Hi,
I know the running time of Binary Search Tree i.e O(lg N). Currently i am reading a data structure book in connection with Binary Search Tree. In this topic they have mentioned the concept of IPL (Internal Path Length as the sum of all path lengths of all nodes) & used the formula : Summation (i-1) li. However then they talk about average IPL as : IPL/n, but in the

Path(worst) = 1/n Summation (i-1)
they did not use the term li.

I cant understand why they eliminated term li ? I have attached a picture of the book's page. Can somebody please guide me?

Zulfi.
 

Attachments

sailorjoe

Joined Jun 4, 2013
364
Zulfi, did you notice that the worst case path length occurs when the tree becomes a linked list?
Li is the number of nodes on level I of the tree. How many nodes are on any level of a linked list?
 

Thread Starter

zulfi100

Joined Jun 7, 2012
656
Thanks. I got this idea. When the shape of tree turns into a LL (i.e straight line), we would have one node in each level. So li is one (i.e '1') so we can neglect it (i.e li). But now I have two Questions:

i) Why is this the worst case? I mean when the Binary Search tree becomes the LL why its considered as the worst case??
ii) Why in this worst case are we considering average path length? i mean why are we dividing the equation by n?

Some body please guide me.

Zulfi.
 

sailorjoe

Joined Jun 4, 2013
364
It's the worst case precisely because there is only one leaf per node, so the number of nodes is the number of leaves, minus one. That's a lot of nodes for the given number of leaves. Draw a picture of a regular binary tree with say, three or four levels of depth. How many comparisons does it take to get from the top to the bottom?
Now draw a linked list with the same number of leaves. Look at the difference, and you'll see why it's the worst case. That's the insight for i.
For ii, consider if you do a search for each element in the linked list, how many steps will each one take? Then, what would be the average number of steps? And in this case, you aren't dividing by n. Look again at the formula for path(worst).
 

WBahn

Joined Mar 31, 2012
29,979
It's the worst case precisely because there is only one leaf per node, so the number of nodes is the number of leaves, minus one.
???

A leaf node is a node with zero children. A worst case BST (one that is effectively a linked-list) has one leaf node total regardless of how many nodes are in the tree. Thus, knowing the number of leaf nodes doesn't tell you any thing about the number of nodes in such a tree.

I think perhaps you meant to say that in a worst case BST there is one child per node; however, in ANY tree, the number of nodes is equal to the number of children minus one because every node, except the root, has exactly one parent. Did you mean to say that, in a worse-case BST, the height of the tree is equal to number of nodes, minus one, since each node that is added to the root creates a new level?
 

WBahn

Joined Mar 31, 2012
29,979
Note that the passage of the text only deals with best case (a complete binary tree) and worst case (a linked list). It does not even address average case overall, merely average case for each of these two special cases. The true average case is somewhere between these and works out to be approximately sqrt(n), IIRC.
 

sailorjoe

Joined Jun 4, 2013
364
Yes, your language is more precise than mine. Since this is homework I didn't want to dump the whole story, so cut corners on my description.
The book does point out that the average case is highly dependent on the shape of the tree, but that's as far as they take it on the page we get to see.
 

Thread Starter

zulfi100

Joined Jun 7, 2012
656
Hi,
Hi,

Thanks, WBhan & Sailojoe,

<It's the worst case precisely because there is only one leaf per node, so the number of nodes is the number of leaves, minus one.

???>

<A leaf node is a node with zero children. A worst case BST (one that is effectively a linked-list) has one leaf node total regardless of how many nodes are in the tree. Thus, knowing the number of leaf nodes doesn't tell you any thing about the number of nodes in such a tree.


I am able to understand what Sailorjoe said but WBahn you are 100% correct (SailorJoe also acknowledges it).

<It's the worst case precisely because there is only one leaf per node>…..(1)

This would make a straight line tree which is same as LL (but not to array because in arrays we can use binary search) and in LL (but not in arrays because in arrays we have random access) if we want to traverse the last node (which is the worst case) in LL then we have to visit all nodes, there is no short cut.

Thanks for your help on this.

I think following is not correct (please give an example what you mean by this)

<, so the number of nodes is the number of leaves, minus one.>

And also this:

<That's a lot of nodes for the given number of leaves.>




<in ANY tree, the number of nodes is equal to the number of children minus one because every node, except the root, has exactly one parent. >

Sorry I don’t know how you got this. Suppose I have a tree with a root & 2 children then according to your saying there would :

Number of nodes = number of children -1= 2-1 = 1 (which is wrong because we have 3 nodes in the tree)


Any way I am not finding the number of nodes at this point. I would try this in some other post.

<For ii, consider if you do a search for each element in the linked list, how many steps will each one take? Then, what would be the average number of steps? And in this case, you aren't dividing by n.>

Why are we searching each element of the list??? We are talking about the worst. In worst case, our element is the last element of LL. So we have to search only the last element. So technically it should be O(n) but the book (which also comes up with O(n) running time) has used a different technique which you are saying. i.e for first element searching, we have one step, for second element search we have two steps and for n element search we have ‘n’ steps. This means we are using the eq n* (n+1)/2 & n*(n+1)/(2 * n) is O(n). But we need two loops for this type of searching (i.e for first element searching, we have one step, for second element search we have two steps and for n element search we have ‘n’ steps and its running time would involve summation of steps for 1st element , second element & so on )where as in searching last element in LL we need only one loop & there is no involvement of sum type of thing as the book is doing here and you are telling me. This is what I cant understand.

So I am still not able to understand why they have found average & the why they have used summation?

Thanks.
 
Last edited:

WBahn

Joined Mar 31, 2012
29,979
I think following is not correct (please give an example what you mean by this)

<in ANY tree, the number of nodes is equal to the number of children minus one because every node, except the root, has exactly one parent. >

Sorry I don’t know how you got this. Suppose I have a tree with a root & 2 children then according to your saying there would :

Number of nodes = number of children -1= 2-1 = 1 (which is wrong because we have 3 nodes in the tree)
You are correct -- I miswrote. The statement that, in any tree, each node except the root has exactly one parent is correct. The conclusion from that is that the number of nodes is therefore equal to the number of children is equal to the number of nodes minus one or, equivalently, the number of nodes is equal to the number of children plus one.

Good catch.

<For ii, consider if you do a search for each element in the linked list, how many steps will each one take? Then, what would be the average number of steps? And in this case, you aren't dividing by n.>

Why are we searching each element of the list??? We are talking about the worst. In worst case, our element is the last element of LL. So we have to search only the last element. So technically it should be O(n) but the book (which also comes up with O(n) running time) has used a different technique which you are saying. i.e for first element searching, we have one step, for second element search we have two steps and for n element search we have ‘n’ steps. This means we are using the eq n* (n+1)/2 & n*(n+1)/(2 * n) is O(n). But we need two loops for this type of searching (i.e for first element searching, we have one step, for second element search we have two steps and for n element search we have ‘n’ steps and its running time would involve summation of steps for 1st element , second element & so on )where as in searching last element in LL we need only one loop & there is no involvement of sum type of thing as the book is doing here and you are telling me. This is what I cant understand.

So I am still not able to understand why they have found average & the why they have used summation?
The "worst case" generally refers to the data organization, not the specific data value that is being sought (though it COULD refer to that -- it depends on the context).

So, absolute worst case would be if the tree is equivalent to a links list AND we happen to be searching for the last item in the list. That would involve n comparisons and would therefore be O(n).

But if we are talking about the expected number of comparisons that would be needed to find a value in a tree that is equivalent to a linked list, then the expected number of comparisons would be (n(n+1))/(2n) = (n+1)/2, which is still O(n).

Which you would you depends on the specific details of the problem you are trying to solve.
 
Top