# Closing timing around a hard multiplier

#### tindel

Joined Sep 16, 2012
716
I've been using a MAX10 device for prototyping a closed loop system. This is my first real FPGA project. I'm using an ADC to measure a value for a control loop. Due to the control loop, I have to do some multiplication. My controller has two stages, each doing multiplication. I've found that I have had to put 2, 3, or 10 cycles of delay between the two stages to get timing to close. But when I look at the skews, this doesn't make sense. There also isn't much time for timing to close (only 239ps, with 2 delay cycles max) which I don't believe is very robust - would prefer 2ns of setup time slack or more.

Input value -> Controller Stage 1 -> Controller Stage 2 -> Output Control value
The controller iteration clock is 50kHz - compared to the main clock of 50MHz.

Below is a picture of their IP block. Note however, that I'm not using their IP block, only the hard multiplier using an "out = data_a * data_b;" command - this synthesizes to using the hard multiplier). It appears the hard multiplier is asynchronous and I'm using an output register on all four stages. When I look at the violations it always ends up being violated by the information delay from Controller Stage 1 to Controller Stage 2.

I think my main question is why isn't my slack improving by 20ns per clock delay if the update rate of my controller is every 20us?

The code is simplified as follows:
Code:
assign dataout = data_A * data_B;
always @(posedge clk or negedge reset_n) begin  // clk is 50MHz
if (~reset_n)
out <= 0;
else if (dlyd_update[DLY_CLK])  // update rate is 50kHz, with 1/50MHz * DLY_CLK cycles
out <= dataout;
end

#### andrewmm

Joined Feb 25, 2011
323

#### tindel

Joined Sep 16, 2012
716
I tried removing the reset, but it doesn't appear to be the root cause, nor does adding delays increase the slack by 20ns.

#### Analog Ground

Joined Apr 24, 2019
397
It has been awhile but I recall your situation is called a "multicycle path" for timing analysis. There is a way to tell the timing analyzer the data path between registers is more than the default number of cycles (which is usually 1). Perhaps you could search the Quartus user guide for this topic. Why don't you use the multiplier IP and the options in the Megafunction Wizard? You can specify input and output registers and to use the hard multipliers. Are you going for max portability?

#### andrewmm

Joined Feb 25, 2011
323
I tried removing the reset, but it doesn't appear to be the root cause, nor does adding delays increase the slack by 20ns.
General point, Resets, think local not global,
ONLY reset things you need to, resets add product terms / routing congestion.

I only just noted, your also enabling the registers,
try getting rid of that as well, see what it does,

These MAC's are funny things
The built in registers of the MAC are required to get speed,
The MAC will easily perform a multiply at 50 MHz,
so no need nor advantage to use a slow enable,
inferring MACs is great, but you need to follow exactly the examples,
as the tools can be flacky about extracting from your code to a hard MAC.

Remember adding registers does not change the slack , it changes the delay though,
have you simulated to prove the function is working as you expect ?

Also remember about the tools doing register push back / duplication, and use of IOB registers.

Can you post your full code and test bench to show what your up to or make a test case that shows what you have that you cna share. The snippet is ok, but open to too many other effects ,