find repeated number

MrChips

Joined Oct 2, 2009
34,828
I am gonna milk this one as long as I can. There are still many weeks of isolation ahead of us.
I am going to supply the solution in drips and dribbles.

The solution is very similar to the concept of hashing.
 

WBahn

Joined Mar 31, 2012
32,877
I haven't given up yet I am watching this thread very closely I am interested to see how do you solve the problem first in your head and then how do you make flow chart for that.
So does that mean that you aren't interested in putting in any more effort on your own? Instead, you are just going to wait for someone else to do all the work and give you the solution?

That's a shame, we were only a couple of steps away from a working solution in https://forum.allaboutcircuits.com/threads/flowchart-find-repeated-number.168406/post-1495131
 

WBahn

Joined Mar 31, 2012
32,877
The solution which I will present does not require sorting or comparisons.
It is simple and fast. I cannot think of any other solution that would be faster.

Can anyone guess what is the solution?
I've already described a linear-time algorithm to solve the problem. It'll be interesting to see how much similarity there is between yours and mine.

A couple of other linear time algorithms would be:

IF we know and have a reasonable range on the values of the numbers (and assuming they are integers), then we can just use the number as an index to an array and increment the value at that index in a counter array.

If we don't have suitable constraints on the values, then we can perform some kind of hash on the value to get a suitable index and do the same thing, except that we need to store the actual value in a list of some kind associated with each index.
 

jpanhalt

Joined Jan 18, 2008
11,087
The solution which I will present does not require sorting or comparisons.
It is simple and fast. I cannot think of any other solution that would be faster.

Can anyone guess what is the solution?
I don't need to guess, post #31 is faster. The gauntlet is down.
 

WBahn

Joined Mar 31, 2012
32,877
I don't need to guess, post #31 is faster. The gauntlet is down.
Perhaps you could walk through a toy example for that algorithm to illustrate how it works.

For instance, let's say I have an array with four values: {2, 19, 4, 19}

Your algorithm says that I need four registers and then I use the values in the array as indices into the array, right. So what do I do when the value is 19?
 

jpanhalt

Joined Jan 18, 2008
11,087
Perhaps you could walk through a toy example for that algorithm to illustrate how it works.

For instance, let's say I have an array with four values: {2, 19, 4, 19}

Your algorithm says that I need four registers and then I use the values in the array as indices into the array, right. So what do I do when the value is 19?
My algorithm worked in simulation with MPLab 8.92. If there is a need to explain it, I will do that later. Basically the act of entering a number creates a marker that can be efficiently (btfsc/s) tested by subsequent numbers for duplication.

Since the original question involved just a few numbers, I simply used relative jumps in RAM to do that. Any memory can be used. With much bigger sets, I would reduce the number of registers/instruction memory used, if required, by addressing individual bits,* much like one does when drawing on a GLCD.

Of course, since the TS seems to be using 16F PIC's, that is what I wrote it for. If one goes outside that hardware limitation and uses an MCU that can address individual bits with a variable (e.g., bsf REGA, x and btfsc/s REGA,x are not specifically allowed when x is a variable), then that changes the implied rules. If that is to be allowed, why not use a PC with gigabits of RAM, which is way out of my league.

*Effectively using %8 to reduce the number of registers needed.
 

iimagine

Joined Dec 20, 2010
512
If memory/storage capacity isnt a problem and the range of number is finite, then the fastest and easiest is to store each number representing its own index: For example
Numbers = [2, 6, 199, 3201, 4.....]
data = []
for (i = 0; i < sizeof(Numbers) - 1; i++)
x = Numbers;
if x > sizeof(data);
resize(data, x);
data[x] = data[x]++;

Of course the array have to be resize each time x is greater then its length.
 
Last edited:

jpanhalt

Joined Jan 18, 2008
11,087
If memory/storage capacity isnt a problem and the range of number is finite, then the fastest and easiest is to store each number representing its own index: For example
Numbers = [2, 6, 199, 3201, 4.....]
data = []
for (i = 0; i < sizeof(Numbers) - 1; i++)
x = Numbers;
if x > sizeof(data);
resize(data, x);
data[x] = data[x]++;

Of course the array have to be resize each time x is greater then its length.
I just "store" the index. It's quicker for later access. How many instructions does that code decompile to?
 

MrChips

Joined Oct 2, 2009
34,828
@WBahn has described the same solution in post #84.

I thought about this when we described the problem as balls in a bag.
Take balls out of the bag one at a time.
Put each ball in a bag and write down on the outside of the bag the number marked on the ball.
This process is the same as the hash process described in a previous post.

How many bags would we need?
If there are 100 balls, then we will need a maximum of 100 bags.
This is how the hash function works. For example, the hash function when given a value "12345", for example, will return a numerical hash index which is going to be the number of our bag. So the solution is simply that the hash index is the same as the number on the ball. In other words, the hash function hash(x) = x.

Our bags will be numbered sequentially from 0 to the max number.
Suppose that we are restricted to numbers from 0-65535. Then we will need 65536 bags. Period. End of story.

Suppose we are told that the numbers are in the range 1 to 100. Then we need 100 bags, regardless of the total number of balls. You take one ball at a time and put it into its proper bag marked with the same number on the outside of the bag. Given boundaries on the input we can place restrictions on the model.

The point here is, you take a problem that may appear abstract mathematically and find a real world working solution to the problem, such as putting numbered balls into numbered bags.

Here is the engineering problem solving methodology.

1. Define the problem. List the specifications, limitations and restrictions.
2. Create a working solution after researching all possibilities.
3. Draw a flowchart.
4. Implement the solution.
5. Test the solution
 

MrChips

Joined Oct 2, 2009
34,828
If you still cannot do this on your own, here is a code snippet of the solution:

for (i = 0; i < n; i++ ) ++count( x(i) );

Correction:
The corrected statement is:
for (i = 0; i < n; i++ ) ++count[ x(i) ];
 

jpanhalt

Joined Jan 18, 2008
11,087
@WBahn has described the same solution in post #84.

I thought about this when we described the problem as balls in a bag.
Take balls out of the bag one at a time.
Put each ball in a bag and write down on the outside of the bag the number marked on the ball.
This process is the same as the hash process described in a previous post.

How many bags would we need?
If there are 100 balls, then we will need a maximum of 100 bags.
This is how the hash function works. For example, the hash function when given a value "12345", for example, will return a numerical hash index which is going to be the number of our bag. So the solution is simply that the hash index is the same as the number on the ball. In other words, the hash function hash(x) = x.

Our bags will be numbered sequentially from 0 to the max number.
Suppose that we are restricted to numbers from 0-65535. Then we will need 65536 bags. Period. End of story.

Suppose we are told that the numbers are in the range 1 to 100. Then we need 100 bags, regardless of the total number of balls. You take one ball at a time and put it into its proper bag marked with the same number on the outside of the bag. Given boundaries on the input we can place restrictions on the model.

The point here is, you take a problem that may appear abstract mathematically and find a real world working solution to the problem, such as putting numbered balls into numbered bags.

Here is the engineering problem solving methodology.

1. Define the problem. List the specifications, limitations and restrictions.
2. Create a working solution after researching all possibilities.
3. Draw a flowchart.
4. Implement the solution.
5. Test the solution
That's it? I call that sorting since you are taking balls that are not in particular order and putting them into sequentially numbered bags.

Moreover, that is exactly what the algorithm and code in my post does. If you looked at it, it takes very few instructions in Assembly for the actual process. The "numbers" are RAM offsets..

The difference is that rather than "move" the balls to the bags, I simply "mark" the target bags and leave the balls where they were. It takes less code and fewer Tcy to read a mark (a set bit) than to move the contents and test for zero. It also allows for easy compression/packing to preserve RAM.

Moving the "ball" to the bag and determining whether the bag is empty is not only slower, the ball takes up space in the bag. My code with trivial modification (setting up the FSR to read/write program memory instead of RAM) could use an unused bit in, say, an existing, non-packed ascii table. Slightly more changes would allow me to store eight "balls" in a single register. That is similar to using only 8 registers to store 64 bits needed for a column in a 64x128 GLCD.

I do agree with you, it seems like a pretty efficient way in terms of speed. Your method wastes time and space by actually moving the balls.
 

WBahn

Joined Mar 31, 2012
32,877
If you still cannot do this on your own, here is a code snippet of the solution:

for (i = 0; i < n; i++ ) ++count( x(i) );
What is the function count() returning?

Why not count[x(i)]++ ?

For this approach to work, you have to use a hash function that is collision free. That's pretty restrictive -- in fact it largely defeats the purpose of using a hash function in most applications.

It's also quite restricting in this application, as well. Even with 16-bit unsigned integers for data values that's an array with 64k entries. But if you simply allow 32-bit unsigned integers for data now you need an array with 4G entries, all of which need to be initialized no matter how few the number of entries in the data array. That's a pretty hefty constant-time penalty to overcome. If the data are random integers on a 64-bit system, you are simply SOL unless you hash them down to a manageable space, but now you have to content with collisions, which means you need to maintain some kind of a list to be able to determine the actual values that hashed to the same index.

But all of this is WAY beyond where the TS is at.
 

jpanhalt

Joined Jan 18, 2008
11,087
Anything can be obfuscated and jargon only adds to the clutter. Try the logic I posted a few times along with the code in post #31. Despite denials by some, it is a simple sorting scheme appropriate for the problem you presented. If you want to extend it to 64-bit numbers, that is more than a bit different.
 

MrChips

Joined Oct 2, 2009
34,828
I have corrected the code snippet as follows:

for (i = 0; i < n; i++ ) ++count[ x(i) ];

I explained in the post that the hash function is simply the value of the number itself. There is no actual hash function involved. There is no sorting involved.

I also explained that if the values can be any 16-bit value then you would need 65536 bags (i.e. an array of 65536 entries).
I placed limits on the problem. For example, values will be from the integer set 1 to 100. This reduces the storage requirements.

All of the discussion was aimed directly at the TS. The TS is struggling with two concepts, (1) problem solving, and (2) drawing flowcharts. TS is doing things backwards. TS wants to write code and then attempts to match the flowchart to the code. This is backwards. Flowchart structure is methodical and consistent. Similarly, coding is methodical and consistent once the flowchart has been drawn.

I am trying to demonstrate to the TS that before you can draw the flowchart, you need to formulate a solution. In helps if you can create a real-life scenario, such as balls in a bag, in order to allow you to visualize and work through the steps of a solution. The only jargon that was added was the concept of "hashing" because this is the general solution to the problem with large data sets and large value ranges.
 

MrChips

Joined Oct 2, 2009
34,828
We seem to have distinctly different understandings of what "sort" means:
True, putting numbered balls into its respective containers would qualify as sorting in the human perspective.
When I think of sorting as a computer algorithm, I think of putting elements in an ordered list using multiple nested iterations.
 

WBahn

Joined Mar 31, 2012
32,877
I feel I am trying to solve a more difficult question than my ability. I made many efforts to solve it but it is not so easy as you say, first solve the problem inside the brain
Which is why I am trying to walk you incrementally to a workable solution.

Go back and do your best to accomplish the next step that was asked in: https://forum.allaboutcircuits.com/threads/flowchart-find-repeated-number.168406/post-1495131

Do it using C code, or pseudocode, or a flowchart.

Currently we have:

Code:
#include <stdio.h>

int main(void)
{
    int data[11] = {5, 3, 7, 1, 6, 3, 9, 12 , 21, 13, 16};
    sizeOfData = 11
    for ( int  i = 0;  i < sizeOfData;  i++)
          printf( "data = %d \n", data[i]);

    return 0;
}
Here's a hint: you want to use a nested loop (loop within a loop), you only need to add a single line of code that will be nearly identical to one of the lines you already have, and the output statement will be changed to something like:

printf(" "(%d, %d)\n", data, data[j]);
 
Top