# How to calculate throughput?

#### Joe24

Joined May 18, 2007
52
Hello,

Ok so I have a question about throughput..

So lets say I have a micro-controller.. This micro-controller has been programmed to take in input blocks of 128-bits each, 32-bits at a time. So it takes 4 clock cycles to clock in 128-bits using a 32-bit input bus. The number of input blocks is a variable, it can be 1 block or 2,000 blocks. But the output is fixed and 64-bits in length, no matter how big the input is. So it takes 2 clock cycles to output 64-bits using a 32-bit output bus.

Now lets say, that I input one 128-bit block. And it takes 60 clock cycles for the micro-controller to process a 128-bit block and produce a 64-bit output. Lets also say that the system clock frequency is 20Mhz. How do I calculate the throughput?

Is it: (1/20Mhz) * 60 clock Cycles = 3us (3 micro-seconds = time to process one block)

1s / 3us = how many blocks could be processed in 1 second = 333,333
And each block is 128-bits, so 128 bits X 333,333s = 42,666,666 total bits

So throughput = 42.66 Mbits per second

Is this how it would be calculated??

Or instead of using the 128-bits in the calculation, do I have to use the 64-bit output instead

EX.

64-bits X 333,333s = 21,333,312 bits => throughput = 21.33 Mbits per second

What about latency?? Do I have to factor in the time it takes to clock in the input block, and clock out the output? So instead of using 60 total clock cycles, I use 4 + 60 + 2 = 66 total clock cycles??

Am I close??

Thanks..

#### n9352527

Joined Oct 14, 2005
1,198
Throughput of a sequential non-pipelined system such as this, would include the time to clock in all the input data, process the data and clock out all the output data. So, it would be 66 clock cycles, assuming that input length is one block. If the system had to wait for complete input blocks before starting to process the data, then for longer input length the clock cycles needed would be considerably longer. In this system, the latency is equal to the 1/throughput.

If the system has some sort of pipelining or stages, then the throughput could be made higher (approaching the slowest stage throughput), and might not be equal to 1/latency anymore. E.g. Clocking in input blocks while processing the data would reduce the cycles required but leaves the latency somewhat constant.

#### Joe24

Joined May 18, 2007
52
Hello,

So what about the actual calculation itself. Do I use the 128-bit block in the calculation, or the 64-bit output to calculate for the throughput? It would be easy of course if the system had the same size input and output; say a 64-bit input and 64-bit output. In that case of course I would use 64-bits in my calculation. But in my case, im not too confident in choosing one over the other.

Thanks much.

#### n9352527

Joined Oct 14, 2005
1,198
I would tend to use the output data (64-bits) instead of the input data. However, there is no fast rule for this. It depends on what the throughput number would be used for. For example, a 32-bit adder would take two 32-bits value and produce one 32-bit value with an overflow bit. If the output data is used, then the throughput would be half of when the input data is used. In this case, an operation per second value might be more appropriate.

Just ask yourself, what is the significance of the throughput value and how it would be used.