Time for Q-sort

Thread Starter

zulfi100

Joined Jun 7, 2012
656
Hi,
Can somebody plz guide me with the following question:

A machine wants a minimum of 100 sec to sort 1000 names by quick sort. What is the minimum time needed to sort 100 names?
I used n*lgn formula but my answer is not correct. Kindly guide me.

Zulfi.
 

WBahn

Joined Mar 31, 2012
30,088
Zulfi, we've been down this road literally dozens of times by now. Show your work! Also, if you know what the correct answer is supposed to be, provide that, too. Sometimes the "correct" answer isn't so correct.
 

Thread Starter

zulfi100

Joined Jun 7, 2012
656
Hi,
Thanks for your reply. The answer given in the book is 6.7 sec. Using formula
n * ln (n), its 460.

Kindly guide me about this.

Zulfi.
 

Thread Starter

zulfi100

Joined Jun 7, 2012
656
Hi,
Thanks for your reply.

Here n= 100 names, and ln(100)=4.6. So answer=460.

Kindly guide about my mistake in the calculation.

Zulfi.
 

WBahn

Joined Mar 31, 2012
30,088
460 what?

UNITS!!!!!!!

n is either a pure number or has units of "names". If we take it as a pure number, then

n*ln(n) is dimensionless, while you want something that has units of seconds.

Therefore you KNOW that this answer is WRONG!

So you go back and look at it and see that you need to scale things appropriately.

T(n) is PROPORTIONAL to n*ln(n), it is not EQUAL to n*ln(n). You make them equal by introducing a proportionality constant, k.

T(n) = k*n*ln(n)

We know that

T(1000) = k*1000*ln(1000) = 100s

k = 100s/(1000*ln(1000)) = 14.48ms

Gee, look what happens. Our proportionality constant has units of time!

T(100) = 14.18ms * 100 * ln(100) = 6.67s

Do you see why I am always harping on units?

Once you realize that you need a proportionality constant, you can also realize that the base that you use for the logarithm is arbitrary since the log of one base differs from another by a scaling constant. So we can use base-10 logs instead:

T(1000) = k*1000*log(1000) = 100s

k = 100s/(1000*log(1000)) = (1/30)s

T(100) = (1/30)ms * 100 * log(100) = (200/30)s = 10*(2/3)s = 6.67s

Now, recall that I said that n could either be treated as a pure number or as a quantity with the unit of "names"? I prefer the latter, but that introduces a minor complexity that needs to be dealt with (and that is easy to deal with).

I'll put that in a follow-up post because I have to get out the door right now.
 

absf

Joined Dec 29, 2010
1,968
Hi,
Thanks for your reply.

Here n= 100 names, and ln(100)=4.6. So answer=460.

Kindly guide about my mistake in the calculation.

Zulfi.
If that's the answer, what is the purpose of telling you that n=1000 would take 100 seconds in the question in the first place? So it was very clear that your procedure and answer was wrong....

Allen
 

Thread Starter

zulfi100

Joined Jun 7, 2012
656
Hi,
If that's the answer, what is the purpose of telling you that n=1000 would take 100 seconds in the question in the first place? So it was very clear that your procedure and answer was wrong....

Allen
Thanks for your reply. Yes my procedure was wrong b/c i did not know that i have to use the recurrence eq. At first i thought that i could solve it using unitary method. However, then i realized that its easy and can be done by using big-O notation. I am still not able to go through all the details of explanation provided in this regard by our friend wbahn. But one another thing which i forgot is that q-sort does not have a constant running time. Ofcourse the real reason is the unit.

Zulfi.
 

Thread Starter

zulfi100

Joined Jun 7, 2012
656
Hi,
Thanks WBahn for exposing all this detail which we may not find in books.
I have understood this but if you get time plz tell me the problem associated with using 'n' unit as 'names' instead of a pure number.
Zulfi.
 

WBahn

Joined Mar 31, 2012
30,088
The problem with using n as a physical quantity (in this case "names") is that transcendental functions such as sin(), cos(), ln(), etc., require dimensionless arguments.

The most brute force way to deal with this is to introduce a scaling constant into the function as follows:

T(n) = k * n * ln(a*n)

Now 'k' has units of "seconds per name" and 'a' has units of "name^-1". If you can arbitrarily set 'a' to be equal to 1/name.

We still have the known data point:

T(No) = To = k * No * ln(a * No)

so

k = (To/No) / ln(a * No)

Hence

T(n) = (To/No) / ln(a*No) * n * ln(a*n)

T(n) = To*(n/No) * [ln(a*n) / ln(a*No)]

Notice that the value of 'a' is not truly arbitrary. By choosing it to be equal to 1/name, what we have done is forced our equation to behave the same as the model we started with, namely something proportinal to n*ln(n). But what if we choose a different value of 'a'?

Well, notice that our initial equation isn't well behaved for n=1 since ln(1) = 0. So this would say that it should take zero time to process 1 name. You could argue that this is reasonable since, if there is only one name, you know it is sorted. But, in general, your algorithm would be doing some actual processing on each item and therefore would require some amount of time even for one item.

The reason we generally don't worry about this is that we aren't looking for an exact model, but rather a model that describes the asymptotic behavior as n gets arbitrarily large. We usually neither insist nor expect useful values for small values of n.

If we want to make an equation that will work all the way down to n=1 name and also goes through a particular point for some other number of names, then we need two constants, which we have -- 'k' and 'a' -- and so we just solve for both. If we just have the single data point, then we assume 'a' has unit magnitude and solve for the other.
 
Top