Algorithm Wanted

Thread Starter

joeyd999

Joined Jun 6, 2011
5,234
Suppose I have a set A of n numbers.

Given a number x, I wish to choose a subset B of A such that B has the m elements of A that are closest (numerically) to x.

I would like to do this as fast as possible. I would prefer not to sort A first, but if I needed to, I could not sort in place (thus doubling the size of ram required).
 

WBahn

Joined Mar 31, 2012
29,978
Suppose I have a set A of n numbers.

Given a number x, I wish to choose a subset B of A such that B has the m elements of A that are closest (numerically) to x.

I would like to do this as fast as possible. I would prefer not to sort A first, but if I needed to, I could not sort in place (thus doubling the size of ram required).
Why would sorting in place double the size of RAM required? The whole idea of "in place" is that it does it in the same (or nearly the same) memory footprint.

Do you have enough memory to store the m elements in B separate from the n elements in A? If so, then you should be able to use a greedy algorithm in ~O(n) time. How much larger is n than m (typically)?
 

Thread Starter

joeyd999

Joined Jun 6, 2011
5,234
Why would sorting in place double the size of RAM required? The whole idea of "in place" is that it does it in the same (or nearly the same) memory footprint.
I cannot sort in place, therefore a sort would require twice the ram.

Do you have enough memory to store the m elements in B separate from the n elements in A? If so, then you should be able to use a greedy algorithm in ~O(n) time. How much larger is n than m (typically)?
For this application, the data is 24 bit integers. n is 32. m will likely be 8, 16, or 24.

Time is more important than memory footprint. I'll sort if I have to. Just wondering if there is a way to do it without sorting first.
 

Thread Starter

joeyd999

Joined Jun 6, 2011
5,234
In fact, I don't even really require a set B to be generated. All I really need to do is identify the m elements of A that are closest to x, such that I can operate on just those elements (either during the "search" or after).

The only rule is I cannot corrupt the original order of the data in A.
 

WBahn

Joined Mar 31, 2012
29,978
I cannot sort in place, therefore a sort would require twice the ram.
Ahh, I see. I thought you were saying that the reason that you cannot sort in place is because doing so would double the RAM requirements. Now I understand that you were saying that it would double the RAM requirements because you lack the ability to sort in place. Gotcha.

For this application, the data is 24 bit integers. n is 32. m will likely be 8, 16, or 24.

Time is more important than memory footprint. I'll sort if I have to. Just wondering if there is a way to do it without sorting first.
Do it greedy. If you want the closest, say, 8 values then grab the first 8 values. Then go through the remaining values and see if they should replace the furthest value or not. Once you've processed the last item, you have your subset.

As usual, the key to getting good performance is tied up with the nature of the data, so there are numerous ways that might improve it but each might make things either better or worse depending on that nature.

As you grab your values, put them in order of difference from x. So if x is 1000 you put 1002 before 995 before 1010. After you have your eight values, you compare the next candidate to the last item in the list and if it isn't closer, then you move on. If it is closer, then you insert it into the list at its proper location (making the furthest one drop off) and continue. You may pick up some gain by storing intermediate values (such as the upper and lower limits) instead of recalculating them.
 
Top