10ns signal/event timestamper

Thread Starter

jars121

Joined Mar 8, 2019
3
Hi all,

I'm currently updating and improving an existing design. The architecture of the updated design is as follows:
  • Multiple Cortex A cores running Linux.
  • Multiple (homogenous) Cortex M cores running FreeRTOS (SMP).

The Cortex M cores interface with peripherals on the board and perform real-time data acquisition and pre-processing. Data is then passed to the Linux user space environment (Cortex A cores) with RPMsg.

What I'm looking to implement is a signal/event timestamping capability, and I'm aiming for a 10ns resolution. When a signal or event is captured within FreeRTOS, a timestamp is captured alongside it for processing in Linux. The timestamp is used both for general timestamping purposes (i.e. understanding the relative timing of various inputs, signals, events, etc.) as well as part of the digital input processing function (i.e. measuring pulse width, period, frequency, etc. of digital inputs).

On the current version of the design, the timestamping function is built using multiple cascading/chained timer counters within the FreeRTOS-based MCU which is also doing all the real-time data acquisition. This works to an extent, but due to the frequency of the peripheral clock and the associated TC dividers, the resolution isn't quite what I'm after. Furthermore, given the amount of context switching and ISR handling going on in the MCU, there is noticeable jitter in the resultant timestamp as well. I'm looking for a more deterministic solution, with as little overhead for the Cortex M cores as possible.

With that context out of the way, these are some of the options I've been considering. These are by no means valid or exhaustive, and I'd very much appreciate any insight, suggestion or clarification.

  1. Use a 32-bit binary counter (probably a dual 16-bit chained counter) IC with a parallel or serial output (i2c/SPI) interface to read the current count whenever a signal/event is captured. I know the SN74 series can be used for this purpose, but I'm not sure if I can reliably achieve 100MHz operation, and the latency associated with reading the count via an 8-bit parallel or (relatively slow) serial interface isn't ideal.
  2. Use a small MCU and use the same TC method as above, with an additional output to interrupt the primary Cortex M cores when a rollover occurs. As with the above option, reading from the standalone MCU via a serial interface introduces additional latency; perhaps have the small MCU write to a small asynchronous dual-port RAM module?
  3. Use a CPLD/FPGA. This approach would obviously provide the best timing performance/reliability, and would give flexibility as to how the timestamps are read by the main Cortex M cores, but potentially comes with additional complexity. I could use a high speed serial interface to minimise read latency from the Cortex M cores (e.g. 50MHz SPI), but I had also considered a 32-bit parallel output from the CPLD/FPGA, mapped to a 32-bit wide GPIO register in the Cortex M core. This adds space and layout complexity to the design, but in theory, would allow the Cortex M to read the 32-bit count with a single read of the GPIO port's memory address. Is that feasible?

Are there any other options I haven't considered?

Thanks!
 

MrChips

Joined Oct 2, 2009
30,706
Since I am already an STM32 ARM user I would choose that chip.

You can get STM32 with 550MHz core frequency. They have built-in 32-bit counter modules so you don't need any external hardware to do time-stamping.

In fact I am already doing time-stamping with 10ns resolution.
 

Thread Starter

jars121

Joined Mar 8, 2019
3
Thanks for your input. That certainly sounds like a good option. Accessing the count is where I become unstuck; perhaps a high speed serial interface (50MHz SPI as mentioned above) would be a good compromise.
 

MrChips

Joined Oct 2, 2009
30,706
Thanks for your input. That certainly sounds like a good option. Accessing the count is where I become unstuck; perhaps a high speed serial interface (50MHz SPI as mentioned above) would be a good compromise.
Why bother to use SPI when you can read the counter register directly in parallel with one instruction?
 

Thread Starter

jars121

Joined Mar 8, 2019
3
I would likely use a CP counter as a high resolution timing source.
https://developer.arm.com/documentation/ka001406/latest

https://interrupt.memfault.com/blog/profiling-firmware-on-cortex-m
Let me explain how the system works: CYCCNTENA enables a cycle counter in the DWT unit of your microcontroller. This counter is incremented every CPU cycle (i.e. 168 million times per second).
Ah I had seen something about this and forgot to include it as an option. So once enabled, it's as simple as reading the relevant register address to access the current cycle; that makes perfect sense. I don't suppose there's a watermark or overflow interrupt available as well?

I'm using Cortex M4 cores @ 266MHz; does this mean I'd have access to the full 266MHz cycle rate, or are dividers incorporated?
 
Top