algorithm to find the mid points of datas.

Thread Starter

bug13

Joined Feb 13, 2012
2,002
Hi team

Just wondering if there is a algorithm that can identify the mid points of two groups of data. These two group of data are not evenly distributed, so I can sum them up and find the average.

So here is what I mean:
C:
void main(){
   
    uint32_t data[256] = {0};
    uint8_t val = 0;
   
    for(;;){
        /* a magical function to read the positive
         * pulse of a eletronics signal. This is a
         * blocking function for simplicity.
         */
        val = readPositivePulseOfSignal();
        data[val]++;
    }
}
So say I run this code for a minutes or two, I got this data (these are real data):
C:
data[0] = 0;
data[1] = 0;
...
data[61] = 0;
data[62] = 0;
data[63] = 249750;
data[64] = 5307291;
data[65] = 9838050;
data[66] = 5433023;
data[67] = 219066;
data[68] = 0;
data[69] = 0;
...
data[91] = 0;
data[92] = 0;
data[93] = 23;
data[94] = 3178823;
data[95] = 32018044;
data[96] = 9601470;
data[97] = 44613;
data[98] = 0;
data[99] = 0;
...
data[254] = 0;
data[255] = 0;
Now I can count the empty ones and work out the mid-point is data[80], but how can I do it with an fast algorithm in real time? And maybe don't need to use that much ram?
 

jpanhalt

Joined Jan 18, 2008
11,087
By "midpoint" do you mean "mean" or "median." Please try to use less ambiguous nomenclature. You mention "zeros." Are they included or excluded?
 

Thread Starter

bug13

Joined Feb 13, 2012
2,002
By "midpoint" do you mean "mean" or "median." Please try to use less ambiguous nomenclature. You mention "zeros." Are they included or excluded?
this is the mid-point that I am referring to. I know is definitely not "mean", possible not "median" as well. I am not sure what you mean by "zeros" included or exdluded.

C:
data[0] = 0;
data[1] = 0;
...
data[61] = 0;
data[62] = 0;
data[63] = 249750;
data[64] = 5307291;
data[65] = 9838050;
data[66] = 5433023;
data[67] = 219066;
data[68] = 0;
data[69] = 0;
...
data[80] = 0;     // this is the mid-point I am looking for
                // 12 zero count from data[67], and 12 zero
                  // count from data[93]
...
data[91] = 0;
data[92] = 0;
data[93] = 23;
data[94] = 3178823;
data[95] = 32018044;
data[96] = 9601470;
data[97] = 44613;
data[98] = 0;
data[99] = 0;
...
data[254] = 0;
data[255] = 0;
 

402DF855

Joined Feb 9, 2013
271
So data[] is basically a histogram of values read with readPositivePulseOfSignal. To find your "midpoint" you could either search the array for the first and last non-zero elements and use the average of those. You could integrate the array then find the index of the half-way value of the integral sum.

Since it's hard to imagine how this is used, it is difficult to make a recommendation.
 

jpanhalt

Joined Jan 18, 2008
11,087
Median is the middle number in an ordered set. That is, there are just as many values greater than the median as less than the median:
1588113195703.png
Source: Wikipedia


If you want simply to find the largest number, e.g., 32018044 in the example you give, that is a different question. Mode is the most frequent number. There can be more than one mode.

If you simply want the "middle" number, that is yet another question. It could be 0 while all the other numbers are greater than zero. There is probably little use in finding the middle number, as simply getting one more result could drastically change it.

It appears you may actually be looking for the "maximum" number.
 
Last edited:

WBahn

Joined Mar 31, 2012
29,978
this is the mid-point that I am referring to. I know is definitely not "mean", possible not "median" as well. I am not sure what you mean by "zeros" included or exdluded.

C:
data[0] = 0;
data[1] = 0;
...
data[61] = 0;
data[62] = 0;
data[63] = 249750;
data[64] = 5307291;
data[65] = 9838050;
data[66] = 5433023;
data[67] = 219066;
data[68] = 0;
data[69] = 0;
...
data[80] = 0;     // this is the mid-point I am looking for
                // 12 zero count from data[67], and 12 zero
                  // count from data[93]
...
data[91] = 0;
data[92] = 0;
data[93] = 23;
data[94] = 3178823;
data[95] = 32018044;
data[96] = 9601470;
data[97] = 44613;
data[98] = 0;
data[99] = 0;
...
data[254] = 0;
data[255] = 0;
Is your data such that you are GUARANTEED that the array will consist of all zeroes except for two groupings of data that are all non-zero?

Will the span of the two groups always be the same (or very nearly the same)? If not, does that affect what the "midpoint" is?

How accurate does your "midpoint" need to be? If you had a quick algorithm that produced 78 or 82, would that be good enough?
 

Thread Starter

bug13

Joined Feb 13, 2012
2,002
So data[] is basically a histogram of values read with readPositivePulseOfSignal. To find your "midpoint" you could either search the array for the first and last non-zero elements and use the average of those. You could integrate the array then find the index of the half-way value of the integral sum.

Since it's hard to imagine how this is used, it is difficult to make a recommendation.
Those are the measurement of the non-standard signal, it looks like this:
|__|¯¯| bit 0, positive pulse is about 320us
|_|¯¯¯| bit 1, positive pulse is about 480us
And then I need to assemble the bits into bytes, and then bytes into packets.

The mid-point is for deciding the bit is 1 or 0, so I can assemble it into a byte. What I am doing now just manually figure the mid-poiint and hardcode it into my code, and it works fine in my test. I am just thinking if I can calculate the mid-point in real time, I can make my decoding better?

Just another random thought to improve my code :)
 

Thread Starter

bug13

Joined Feb 13, 2012
2,002
Median is the middle number in an ordered set. That is, there are just as many values greater than the median as less than the median:
View attachment 205655
Source: Wikipedia


If you want simply to find the largest number, e.g., 32018044 in the example you give, that is a different question. Mode is the most frequent number. There can be more than one mode.

If you simply want the "middle" number, that is yet another question. It could be 0 while all the other numbers are greater than zero. There is probably little use in finding the middle number, as simply getting one more result could drastically change it.

It appears you may actually be looking for the "maximum" number.
Definitely not maximum number :)
 

Thread Starter

bug13

Joined Feb 13, 2012
2,002
Is your data such that you are GUARANTEED that the array will consist of all zeroes except for two groupings of data that are all non-zero?

Will the span of the two groups always be the same (or very nearly the same)? If not, does that affect what the "midpoint" is?

How accurate does your "midpoint" need to be? If you had a quick algorithm that produced 78 or 82, would that be good enough?
1) I can't GUARANTEED in a real world (noise etc...), but for this example, lets say I can GUARANTEED that.
2) the span of the two groups will be very nearly the same (depending the clock of the other device I have no control off), if the the span of the two groups shift, my "midpoint" need to shift accordingly.
3) How accurate does your "midpoint" need to be? 78 or 82 is more than good enough, but 79 or 80 is better.

PS 2), if the span shift, it will shift very very very slowly I would think, as it mainly depending on room temperature, which affect the speed of the RC clock.
 
Last edited:

jpanhalt

Joined Jan 18, 2008
11,087
Yes, I guess "thresholds" is the word I should be using in the first place.
In Assembly, that is very easy to do. You can even three-state. That is, higher, lower, or in range. I suspect it is as easy to do in C too. It is pretty standard.

Edit: Here's a link to the PICList page: http://www.piclist.com/techref/microchip/compcon.htm
There is also some code in C on PicList, but I didn't find it.
 
Last edited:

WBahn

Joined Mar 31, 2012
29,978
Those are the measurement of the non-standard signal, it looks like this:
|__|¯¯| bit 0, positive pulse is about 320us
|_|¯¯¯| bit 1, positive pulse is about 480us
And then I need to assemble the bits into bytes, and then bytes into packets.

The mid-point is for deciding the bit is 1 or 0, so I can assemble it into a byte. What I am doing now just manually figure the mid-poiint and hardcode it into my code, and it works fine in my test. I am just thinking if I can calculate the mid-point in real time, I can make my decoding better?

Just another random thought to improve my code :)
So, from your diagram, it looks like a 0 bit is a cycle that has about 50% duty cycle and a 1 bit is a cycle that has about 75% duty cycle? What is the specification on these?

Can't you just use that?

How long are the packets? What does the framing on each packet look like?
 

Thread Starter

bug13

Joined Feb 13, 2012
2,002
So, from your diagram, it looks like a 0 bit is a cycle that has about 50% duty cycle and a 1 bit is a cycle that has about 75% duty cycle? What is the specification on these?

Can't you just use that?

How long are the packets? What does the framing on each packet look like?
In my actual application, it's 50% duty cycle and 75% duty. And I can decode that no problem. The packet is start on 5 set of 0xFF 0x00. The length of the packet is encoded in the second byte, it's dynamic length.

I guess I am think beyond my application, what if someone (in extreme case) put in a 1MHz crystal instead of 2MHz cyrstal, all my measured data are shifted, can my code still find the "mid-point" the decode the data properly?
 

WBahn

Joined Mar 31, 2012
29,978
What is the maximum length of a packet?

There are two common approaches to problems like this. The simplest is to set specifications on min/max frequency on the transmitter and max packet length such that the receiver, who is operating within the receiver's min/max frequency, detects that start of a packet and then can assume where the sampling point is and they will be close enough so that they won't encounter a framing error until beyond the max packet length.

The other is to do some kind of clock recovery and synchronization at the receiver. If want to track a 2x change in frequency, that's what you'll need to do. There are several ways of doing it and your waveform should be pretty easy to do.
 
Top