algorithm to find the mid points of datas.

bug13 · Apr 28, 2020

Hi team

Just wondering if there is a algorithm that can identify the mid points of two groups of data. These two group of data are not evenly distributed, so I can sum them up and find the average.

So here is what I mean:

C:

void main(){
   
    uint32_t data[256] = {0};
    uint8_t val = 0;
   
    for(;;){
        /* a magical function to read the positive
         * pulse of a eletronics signal. This is a
         * blocking function for simplicity.
         */
        val = readPositivePulseOfSignal();
        data[val]++;
    }
}

So say I run this code for a minutes or two, I got this data (these are real data):

C:

data[0] = 0;
data[1] = 0;
...
data[61] = 0;
data[62] = 0;
data[63] = 249750;
data[64] = 5307291;
data[65] = 9838050;
data[66] = 5433023;
data[67] = 219066;
data[68] = 0;
data[69] = 0;
...
data[91] = 0;
data[92] = 0;
data[93] = 23;
data[94] = 3178823;
data[95] = 32018044;
data[96] = 9601470;
data[97] = 44613;
data[98] = 0;
data[99] = 0;
...
data[254] = 0;
data[255] = 0;

Now I can count the empty ones and work out the mid-point is data[80], but how can I do it with an fast algorithm in real time? And maybe don't need to use that much ram?

jpanhalt · Apr 28, 2020

By "midpoint" do you mean "mean" or "median." Please try to use less ambiguous nomenclature. You mention "zeros." Are they included or excluded?

bug13 · Apr 28, 2020

jpanhalt said:
By "midpoint" do you mean "mean" or "median." Please try to use less ambiguous nomenclature. You mention "zeros." Are they included or excluded?

this is the mid-point that I am referring to. I know is definitely not "mean", possible not "median" as well. I am not sure what you mean by "zeros" included or exdluded.

C:

data[0] = 0;
data[1] = 0;
...
data[61] = 0;
data[62] = 0;
data[63] = 249750;
data[64] = 5307291;
data[65] = 9838050;
data[66] = 5433023;
data[67] = 219066;
data[68] = 0;
data[69] = 0;
...
data[80] = 0;     // this is the mid-point I am looking for
                // 12 zero count from data[67], and 12 zero
                  // count from data[93]
...
data[91] = 0;
data[92] = 0;
data[93] = 23;
data[94] = 3178823;
data[95] = 32018044;
data[96] = 9601470;
data[97] = 44613;
data[98] = 0;
data[99] = 0;
...
data[254] = 0;
data[255] = 0;

402DF855 · Apr 28, 2020

So data[] is basically a histogram of values read with readPositivePulseOfSignal. To find your "midpoint" you could either search the array for the first and last non-zero elements and use the average of those. You could integrate the array then find the index of the half-way value of the integral sum.

Since it's hard to imagine how this is used, it is difficult to make a recommendation.

jpanhalt · Apr 28, 2020

Median is the middle number in an ordered set. That is, there are just as many values greater than the median as less than the median:

Source: Wikipedia

If you want simply to find the largest number, e.g., 32018044 in the example you give, that is a different question. Mode is the most frequent number. There can be more than one mode.

If you simply want the "middle" number, that is yet another question. It could be 0 while all the other numbers are greater than zero. There is probably little use in finding the middle number, as simply getting one more result could drastically change it.

It appears you may actually be looking for the "maximum" number.

WBahn · Apr 28, 2020

bug13 said:

this is the mid-point that I am referring to. I know is definitely not "mean", possible not "median" as well. I am not sure what you mean by "zeros" included or exdluded.

C:

data[0] = 0;
data[1] = 0;
...
data[61] = 0;
data[62] = 0;
data[63] = 249750;
data[64] = 5307291;
data[65] = 9838050;
data[66] = 5433023;
data[67] = 219066;
data[68] = 0;
data[69] = 0;
...
data[80] = 0;     // this is the mid-point I am looking for
                // 12 zero count from data[67], and 12 zero
                  // count from data[93]
...
data[91] = 0;
data[92] = 0;
data[93] = 23;
data[94] = 3178823;
data[95] = 32018044;
data[96] = 9601470;
data[97] = 44613;
data[98] = 0;
data[99] = 0;
...
data[254] = 0;
data[255] = 0;

Is your data such that you are GUARANTEED that the array will consist of all zeroes except for two groupings of data that are all non-zero?

Will the span of the two groups always be the same (or very nearly the same)? If not, does that affect what the "midpoint" is?

How accurate does your "midpoint" need to be? If you had a quick algorithm that produced 78 or 82, would that be good enough?

bug13 · Apr 28, 2020

402DF855 said:
So data[] is basically a histogram of values read with readPositivePulseOfSignal. To find your "midpoint" you could either search the array for the first and last non-zero elements and use the average of those. You could integrate the array then find the index of the half-way value of the integral sum.

Since it's hard to imagine how this is used, it is difficult to make a recommendation.

Those are the measurement of the non-standard signal, it looks like this:
|__|¯¯| bit 0, positive pulse is about 320us
|_|¯¯¯| bit 1, positive pulse is about 480us
And then I need to assemble the bits into bytes, and then bytes into packets.

The mid-point is for deciding the bit is 1 or 0, so I can assemble it into a byte. What I am doing now just manually figure the mid-poiint and hardcode it into my code, and it works fine in my test. I am just thinking if I can calculate the mid-point in real time, I can make my decoding better?

Just another random thought to improve my code

bug13 · Apr 28, 2020

jpanhalt said:
Median is the middle number in an ordered set. That is, there are just as many values greater than the median as less than the median:
View attachment 205655
Source: Wikipedia

If you want simply to find the largest number, e.g., 32018044 in the example you give, that is a different question. Mode is the most frequent number. There can be more than one mode.

If you simply want the "middle" number, that is yet another question. It could be 0 while all the other numbers are greater than zero. There is probably little use in finding the middle number, as simply getting one more result could drastically change it.

It appears you may actually be looking for the "maximum" number.

Definitely not maximum number

jpanhalt · Apr 28, 2020

Re: Post #7

Are you asking for "thresholds?" Like, if the value is >X, it is assigned a 1 and if <=X it is assigned a 0?

bug13 · Apr 28, 2020

WBahn said:
Is your data such that you are GUARANTEED that the array will consist of all zeroes except for two groupings of data that are all non-zero?

Will the span of the two groups always be the same (or very nearly the same)? If not, does that affect what the "midpoint" is?

How accurate does your "midpoint" need to be? If you had a quick algorithm that produced 78 or 82, would that be good enough?

1) I can't GUARANTEED in a real world (noise etc...), but for this example, lets say I can GUARANTEED that.
2) the span of the two groups will be very nearly the same (depending the clock of the other device I have no control off), if the the span of the two groups shift, my "midpoint" need to shift accordingly.
3) How accurate does your "midpoint" need to be? 78 or 82 is more than good enough, but 79 or 80 is better.

PS 2), if the span shift, it will shift very very very slowly I would think, as it mainly depending on room temperature, which affect the speed of the RC clock.

bug13 · Apr 28, 2020

jpanhalt said:
Re: Post #7

Are you asking for "thresholds?" Like, if the value is >X, it is assigned a 1 and if <=X it is assigned a 0?

Yes, I guess "thresholds" is the word I should be using in the first place.

jpanhalt · Apr 28, 2020

bug13 said:
Yes, I guess "thresholds" is the word I should be using in the first place.

In Assembly, that is very easy to do. You can even three-state. That is, higher, lower, or in range. I suspect it is as easy to do in C too. It is pretty standard.

Edit: Here's a link to the PICList page: http://www.piclist.com/techref/microchip/compcon.htm
There is also some code in C on PicList, but I didn't find it.

WBahn · Apr 28, 2020

bug13 said:
Those are the measurement of the non-standard signal, it looks like this:
|__|¯¯| bit 0, positive pulse is about 320us
|_|¯¯¯| bit 1, positive pulse is about 480us
And then I need to assemble the bits into bytes, and then bytes into packets.

The mid-point is for deciding the bit is 1 or 0, so I can assemble it into a byte. What I am doing now just manually figure the mid-poiint and hardcode it into my code, and it works fine in my test. I am just thinking if I can calculate the mid-point in real time, I can make my decoding better?

Just another random thought to improve my code

So, from your diagram, it looks like a 0 bit is a cycle that has about 50% duty cycle and a 1 bit is a cycle that has about 75% duty cycle? What is the specification on these?

Can't you just use that?

How long are the packets? What does the framing on each packet look like?

bug13 · Apr 28, 2020

jpanhalt said:
In Assembly, that is very easy to do. You can even three-state. That is, higher, lower, or in range. I suspect it is as easy to do in C too. It is pretty standard.

I am interested in how you do that?

jpanhalt · Apr 28, 2020

Hi bug13,

I just added the link to my previous post. There is some code in C on PicList too.

bug13 · Apr 28, 2020

WBahn said:
So, from your diagram, it looks like a 0 bit is a cycle that has about 50% duty cycle and a 1 bit is a cycle that has about 75% duty cycle? What is the specification on these?

Can't you just use that?

How long are the packets? What does the framing on each packet look like?

In my actual application, it's 50% duty cycle and 75% duty. And I can decode that no problem. The packet is start on 5 set of 0xFF 0x00. The length of the packet is encoded in the second byte, it's dynamic length.

I guess I am think beyond my application, what if someone (in extreme case) put in a 1MHz crystal instead of 2MHz cyrstal, all my measured data are shifted, can my code still find the "mid-point" the decode the data properly?

WBahn · Apr 29, 2020

What is the maximum length of a packet?

There are two common approaches to problems like this. The simplest is to set specifications on min/max frequency on the transmitter and max packet length such that the receiver, who is operating within the receiver's min/max frequency, detects that start of a packet and then can assume where the sampling point is and they will be close enough so that they won't encounter a framing error until beyond the max packet length.

The other is to do some kind of clock recovery and synchronization at the receiver. If want to track a 2x change in frequency, that's what you'll need to do. There are several ways of doing it and your waveform should be pretty easy to do.

Similar threads	Forum	Replies	Date
Looking for the name of an algorithm	Machine Learning, AI & Neural Networks	4	Feb 15, 2024
Algorithm to sample and interpret scintillator pulses	General Science, Physics & Math	7	Sep 19, 2023
Soft start using triac	PCB Layout , EDA & Simulations	2	Dec 5, 2022
Can Not Find Sine Algorithm, Help Needed	General Science, Physics & Math	30	Jul 5, 2021
Algorithm to find a minimal CNF/DNF for a boolean expression	Programming & Languages	4	Sep 21, 2014

algorithm to find the mid points of datas.

Join our Engineering Community! Sign-in with:

algorithm to find the mid points of datas.

bug13

jpanhalt

bug13

402DF855

jpanhalt

WBahn

bug13

bug13

jpanhalt

bug13

bug13

jpanhalt

WBahn

bug13

jpanhalt

bug13

WBahn

You May Also Like

PCIe 7.0 Roundup: Test and Timing Tools Emerge as Ecosystem Takes Shape

HDLs Are Software (and It’s Crazy We’re Still Arguing About It)

Infineon Intros ‘Virtual Oscilloscope’ for Digital Power Conversion

Maxlinear Adds Serial Transceivers for Harsh Industrial Applications