Help Me Troubleshoot Circuit Glitches

Thread Starter

mcardoso

Joined May 19, 2020
226
Hi All,

Thanks to the generous help of many on this site, I have successfully designed and built a digital circuit to interface an industrial robot with proprietary encoder signals to a set of Allen Bradley servo drives. Amazingly to me, the circuit really does work for the most part. I am having a couple of issues which I think are related to timing glitches. I am hopeful some on here might be able to help me identify where the glitches come from and how I might go about fixing them

1593379953825.png

Circuit Description:
The circuit looks at 3 inputs (A, B, & C) and generates 4 additional outputs (Z, S1, S2, & S3). There are 4 states of A and B (00, 01, 11, 10) and the C input can be high or low during each of those states. All signals rise and fall at the same time. Depending on the state of input C for each of the states of A and B, the circuit evaluates the appropriate output. These signals are from the motor encoder and can occur at a frequency between 0 and 700kHz on a continuum.

Problem Description:
The circuit correctly generates the appropriate outputs, however there are two types of glitches that occur on the output. One is a very brief pulse low or high when the signal should be steady. These glitches appear to be a single clock period wide and happen at random times. The second glitch seems to only occur when an output is high and causes the output to fall LOW for 3 quadrature counts before the output rises high again (depending on motor speed this could be a very short or a very long time - 8192 count per revolution).

Theories:
My working theory is that both types of glitches occur at the rising and falling edges of the input signals and happen when the clock pulse occurs at a bad time. For example, lets say A is HIGH and B is rising, and at the same time C is falling. At this exact moment, the clock samples the input. It registers A and B as HIGH but C had not yet fallen low enough and gets incorrectly sampled HIGH. This passes the incorrect information into the circuit and a different pattern is detected and the output glitches high or low. On the next clock pulse, the input is sampled again and the error is corrected, giving a 1 clock pulse glitch on the output.

The worse case is when the sample occurs on the falling edge of a quadrature count. Lets use the same example as before A is HIGH, B is rising and C is falling. This time the clock pulse arrives a few nanoseconds earlier and the inputs are sampled. A is sampled as HIGH, B has not risen high enough so it is sampled as LOW, and C has fallen low enough to be sampled as LOW. In this case the state we are exiting (A high B low) is sampled incorrectly and a glitch is passed into the circuit. Unfortunately on the next clock pulse, B has risen HIGH and we are in a new state (A high B high). The erroneous sample is latched into memory until 3 quadrature counts have occurred and the first state (A high B low) is presented again and is sampled correctly. I think that this is what is occurring in the pictures below.

I think this is an issue with trying to sample a parallel asynchronous signal (encoder input) into a synchronous clocked circuit where I cannot control what the inputs will be when the clock samples them, but bad samples are not tolerable on the output (faults servo drive and creates poor motor performance).

Circuit Architecture:
The circuit is a combinational logic circuit with clocked flip flops to store the states of previous inputs. The inputs are double buffered by D type flip flips before being presented to the combinational logic. The logic is buffered again before being output.

What I am asking for:
1) I want to understand why these glitches occur.
2) How does one begin to decompose the circuit to troubleshoot it?
3) Is there a way to sample an asynchronous signal to avoid sampling on the rising and falling edges? For example, delay after a detected edge.
4) Would a faster or slower clock help? My signal is 0-700kHz and my clock is 8MHz.

Images:
#1: A clean transition of output S1 (Channel D)
IMG_9220.jpg

#2: A 3 count glitch on the output S1 (Channel D). This should have remained HIGH.

IMG_9222.jpg

#3: Two separate 3 count glitches on output S1 (Channel D). It should have remained HIGH the entire time.

IMG_9224.jpg

Attachments:
If you want to see interactive circuit logic, go here: https://www.falstad.com/circuit/

You can open the circuit in the web browser, but I like to download the offline one (link below the applet window). I have attached a circuit file below which can be opened in this applet. Alt-Click and drag allows you to pan. This is a functional schematic and doesn't show how gates are arranged in the chips, power wiring, or the transceivers at each end of the circuit.

I have also attached a PDF of the EAGLE schematic that fully represents the components used in the circuit. It might not be as easy to read as the simulation schematic mentioned above, but it does show the full detail of the circuit.

I appreciate any suggestions, either general or specific to what I am working on. Everyone here has been very supportive of my endeavors and it is greatly appreciated. I have a lot to learn.
 

Attachments

Thread Starter

mcardoso

Joined May 19, 2020
226
Some more thinking and reading leads me to believe that this is a metastability issue with the synchronizing D-Type flip flops at the very beginning of my circuit.

Since the incoming signal is completely asynchronous to the clock of my circuit, the setup and hold times of the flip flops are easily violated. This isn't a big deal for the A & B inputs since only one can change at a time and it doesn't matter which value the unstable signal settles on. What matters is that the C channel properly evaluates in sync with the A and B channels (this is a parallel data bus).

How can we design sampling of a parallel data bus to avoid metastability issues when you have no control over the incoming signal and erroneous samples are not tolerable? What is tolerable is added delay and samples being discarded.
 

Thread Starter

mcardoso

Joined May 19, 2020
226
I wanted to follow up with one illustrated example of my issue:

1.jpg

We have 3 signals A, B, C. A and B are square waves 90 degrees out of phase (in this case, A leading B). C is a signal that is the inverse of A (this is not always the case, but the rise and fall always correspond to the rising or falling edges of A or B). This is essentially a parallel data bus.

Case 1: Clock samples right at the line X1. Setup and hold times are violated on the synchronizer and metastability occurs on input B. Since neither A or B change at this boundary, there are no issues with what value the metastability settles on.

Case 2: Clock samples right at the line X2. Setup and hold times are violated on the synchronizer and metastability occurs on inputs A and C. In this case as long as A and C settle together, there is no issue. However remember that channel C is representing NOT A. If the metastability settles and the inputs are sampled as A HIGH and C HIGH then we have a failure of the circuit. Same thing if both A and C get sampled as LOW.

Case 2 is the one I think is causing my circuit real problems and I need to figure out how to avoid it.
 

crutschow

Joined Mar 14, 2008
34,280
One way to minimize metastability effects is to run the signal through two flip-flops (below).
As long as the instability of the first flip-flop is done before the next clock period, the second flip-flop will not experience metastability.
Of course that will delay the asynchronous input by at least one and up to two clock periods.

1593438586753.png
 
Last edited:

Thread Starter

mcardoso

Joined May 19, 2020
226
One way to minimize metastability effects is to run the signal through two flip-flops (below).
As long as the instability of the first flip-flop is done before the next clock period, the second flip-flop will not experience metastability.
Of course that will delay the asynchronous input by at least one and up to two clock periods.

View attachment 210991
Thank you so much for the reply. I have done this with my circuit as shown in the loop backs on U2 and U3 (for example U2 Pin 2 connected to pin 5).

1593442101649.png

1593442010091.png

I think this prevents a metastability event from entering the circuit, however what it does not control is two inputs settling on different values. In the simplest case, if A, B, & C were all a square wave perfectly in phase, then all samples should be A=B=C. If a metastability event occurred and A was not equal to B or C, then the circuit fails. This is pretty much the same issue I have to deal with (except my signals are a bit different).

Does this make sense? I'm struggling to find the right words to describe the issue.

Is there any way to detect when a metastability event has occurred so I could stop the clock from triggering the second flip flop and potentially introducing bad data into the circuit?
 

Thread Starter

mcardoso

Joined May 19, 2020
226
I've read a few papers on this subject last night and this morning and it seems like the probabilistic frequency of metastability occurrence is equal to the signal frequency * clock frequency * setup and hold time.

For my circuit (worst case) this is 700kHz * 8MHz * 4ns = 224kHz

This means that there is going to be (on average) a metastability event every 35 clock cycles when the signal is coming in at full speed. This is unacceptable for my application since the control of the motor that this signal comes from relies on clean output data from my circuit.

It looks like this can be reduced by A) reducing clock frequency, B) finding a flip flop with a shorter setup + hold time, C) finding some circuit to detect a metastability event and throw away that sample.

A) is pretty easy and I think I could reduce the clock from 8 MHz to 2MHz without risking issues with sampling the 700kHz signal.

B) I tried looking for better flip flops (5V CMOS) and haven't found any yet. I used the Texas Instruments "AC" technology family.

C) I have no idea, but this seems like the best solution. Can anyone provide insight?
 

Thread Starter

mcardoso

Joined May 19, 2020
226
I'll add this graphic I found in a discussion of the issue.

1593446153771.png

This is exactly the error I am getting I think. Lots of stuff online talking about the issue, don't see solutions yet.
 

crutschow

Joined Mar 14, 2008
34,280
Here's a thought off the top of my head.
If you sample at a high rate (as high as the needed resolution for a signal change), then you can look at consecutive samples to see if A and C have changed.
If A changes on one sample and C changes on the next (or vice versa), then there was a metastability, and you ignore the first sample and use the second.
That way the A and C signal changes should always be properly detected.

Do you see a problem with doing that?
 

Thread Starter

mcardoso

Joined May 19, 2020
226
Here's a thought off the top of my head.
If you sample at a high rate (as high as the needed resolution for a signal change), then you can look at consecutive samples to see if A and C have changed.
If A changes on one sample and C changes on the next (or vice versa), then there was a metastability, and you ignore the first sample and use the second.
That way the A and C signal changes should always be properly detected.

Do you see a problem with doing that?
I really like this idea, have to think about the simplest way to implement it!
 

crutschow

Joined Mar 14, 2008
34,280
have to think about the simplest way to implement it!
How about, whenever there's a change of state detected for one any line, wait for the next clock pulse to read the data.
That should give the correct value whether a metastable condition has occurred or not.

An XOR gate can be used to detect a change of state between the new data and the previously latched data, since it will output a logic 1 only when the two inputs are different.
 

Thread Starter

mcardoso

Joined May 19, 2020
226
Here is my 3rd iteration of a 3 bit synchronizer.

This circuit adds (3) 2-Input XNOR gates (shown as an XOR and an inverter), (1) 3-Input AND Gate, and (3) 2-Input MUXs.

The circuit holds the output state whenever the input and the state latched between the flip-flops disagree (essentially discarding the state latched between the flip-flops). In the next clock cycle, if the input and the latched state are the same for all channels, then the output is set.

There are 3 cases:

1) No input change occurs, the signals are passed through to the outputs.

2) Input signal change occurs and the latched state is discarded. On the next clock cycle, the input and latched state agree and the data is passes through. This can occur when no metastability occurred, or metastability occurred and settled on the correct values (we know this because the latched values match the inputs at the next clock cycle.

3) Input signal change occurs and the latched state is discarded. On the next clock cycle, the input and latched state disagree and the latched data is again discarded. This occurs when metastability occurred and at least one of the latched values settled on the incorrect value (compared to the second sample of the input)

1593541264645.png

My first attempt was to stop the clock on the second set of flip flops, but this got messy and had a bunch of race conditions.

I think this circuit works pretty well, but there is one issue I see. If the input changes well in advance of the clock for the first flip flop, the propagation delay of the logic through the combinational logic could cause the MUX to switch and violate the setup/hold time of the second flip flop, causing metastability on the output. A simple solution would be to add a 3 flop flop to the end of each synchronizer. I have to think more about this and it might not be an issue because the input signal is slow and wide compared to the clock. Given this info, the latched state and the output state should always be the same, if the input and latched state disagree.

Attached is the Falstad circuit file.
 

Attachments

Thread Starter

mcardoso

Joined May 19, 2020
226
Wanted to post an update to the couple of threads I had started related to this circuit. I updated my design to include the circuit modification and discussed above. This involved adding (1) 4 Channel 2-Input XNOR gate, (1) 3 Channel 3-Input AND gate, and (1) 4 Channel 2:1 multiplexer. These chips work together to detect when an input has changed state and prevent that sample from entering the circuit. Only when 2 successive input readings agree can the outputs of the circuit be updated.

The circuit still functions with the changes I made and all the glitches seem to be gone! I really appreciate all the help and feedback from everyone here.
 
Top