Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

AURIX™ MCU: Bandwidth and timings of HSSL module peripheral in TC3xx and TC4xx series - KBA235817

AURIX™ MCU: Bandwidth and timings of HSSL module peripheral in TC3xx and TC4xx series - KBA235817

BinduPriya_G
Community Manager
Community Manager 250 replies posted First like received 50 sign-ins
Community Manager

AURIX™ MCU: Bandwidth and timings of HSSL module peripheral in TC3xx and TC4xx series - KBA235817

Version: **

The net bandwidth, which can be achieved with HSSL, varies in a wide range from just a fraction up to ~82% of the maximal available bandwidth of 320 MBaud. By knowing the deterministic timings of the different HSSL commands, the feasibility of an intended application use case can be planned and optimized.

Here, the timings are summarized as bare metal, which means, the impact of the less deterministic and application-specific software timings are not considered to the most extent possible.

READ transaction timings

A READ request/response-pair is triggered from the initiator’s side by writing to the Initiator Read Write Address register IRWA.ADDRESS of the respective channel. The processing within the initiator’s HSSL peripheral takes some latency until the READ frame appears on the wire. Once the last bit appears on the wire, the target’s HSSL peripheral processes the request, executes the read command - which adds some more latency - until the target responds with a READRESPONSE frame, which is sent back to the initiator. After the initiator processes the response, the transaction is completed by resetting the ICON.BSY-flag of the respective channel. This is illustrated in the following figure.

BinduPriya_G_7-1659080001795.png
In the following table, the theoretical bus_active time is the calculated time, which is consumed for the bits on the wire at the given baud rate. The hardware latencies add further delays to the total transaction time, which is needed to complete the request/response pair. The total transaction time for reading 32 bits is then set into relation to the baud rate to calculate the utilization of the bus:

 

BinduPriya_G_8-1659080052367.png


Note:      The timings above do not include the register access time of the local CPU to configure and set up the HSSL READ-command itself. Those timings are left out intentionally, because of their software architecture and application-specific nature. To take the access time to the HSSL registers roughly into account, an add-on of approximately 0.35 µs is a good estimation or indication for both, the TC3xx and TC4xx families.

From the table above, it is clear that reducing the baud rate (160 Mbaud, 80 Mbaud) increases the hardware latencies. This increase does not scale linearly, because only a portion of the HSSL peripheral module  operates on the selected baud rate, while a major part of the module still operates on the nominal peripheral’s module clock of 320 MHz. Therefore, the net bus utilization (in the range from ~11% to ~15%) is slightly better on lower baud rates than on higher baudrates.

WRITE transaction timings

Similar to the READ command, a WRITE request/response-pair is triggered from the initiator’s side by writing to the Initiator Read Write Address register IRWA.ADDRESS of the respective channel. The hardware latency characteristics are the same as those in the READ-commands. When the initiator has received and processed the ACK-response from the target, the transaction is completed, by resetting the ICON.BSY-fla of the respective channel. This is illustrated in the following figure.

BinduPriya_G_9-1659080121613.png

 

In the following table, the theoretical bus_active time is the calculated time, which is consumed for the bits on the wire at the given baud rate. The hardware latencies add to further delays to the total transaction time which is needed for the complete request/response pair. The total transaction time for writing 32 bits is then set in relation with the baud rate to calculate the utilization of the bus:  

BinduPriya_G_10-1659080186427.png

Note :    The timings above do not include the register access time of the local CPU to configure and set up the HSSL READ-command itself. Those timings are left out intentionally, because of their software architecture and application-specific nature. To take the access time to the HSSL registers roughly into account, an add-on of approximately 0.35 µs is a good estimation or indication for both, the TC3xx and TC4xx families.

Latencies and timings of a WRITE-transaction are largely comparable to those of a READ-transaction. Also, for WRITE-transactions, the net bandwidth utilization (in the range from ~11% to ~15%) is slightly better on lower baud rates than on higher baud rates.

STREAMING transaction timings

The Streaming mode is preferred to reach the maximum net bandwidth utilization. Within one STREAM-frame of 313 bits, a total payload of 256 bits is transferred as shown in the following figure.

BinduPriya_G_11-1659080237680.png

For a high number of STREAM frames, the overhead of the last ACK-frame and the bus latencies can be ignored. In this case the net bandwidth utilization approximates to its maximum. To calculate it, only the STREAM frames and the payload they carry need to be considered:

                                  Max. net bandwidth utilization = 256 bits / 313 bits = 81.8%

Before starting the streaming, ensure the following:

  • The source address is programmed into ISSA0.START register of the initiator device
  • The destination address is programmed into the TSSA0.ADDR register of target device
  • The frame count (transaction size) is programmed into the ISFC.RELCOUNT register (initiator) and the TSFC.RELCOUNT register (target)
  • MFLAGSSET.TSES is set to enable the target for receiving a streaming transaction

Once both devices are configured for the streaming transaction, the streaming is triggered by the initiator, by setting MFLAGSSET.ISBS. Once the streaming is complete, the MFLAGS.ISB flag is reset by the initiator’s HSSL module peripheral as shown in the following figure:

BinduPriya_G_6-1659079858323.png

Hardware latency overhead on streaming transactions

As shown in the figure above, for unidirectional streaming, hardware latency adds overhead to the total transaction time at the beginning, when the streaming is triggered and at the end, when the last ACK-frame is received and processed by the initiator.

The overall hardware latency duration is not influenced by the total transaction size but it is influenced by the selected baud rate.

In the following tables, the theoretical bus_active time is the calculated time, which is consumed for the bits on the wire at the given baud rate. The hardware latencies add further delays to the total transaction time, which is needed for the complete streaming transfer. The total transaction time for transferring the data, which is carried inside the STREAM frames is then set in relation with the baud rate to calculate the net bandwidth utilization. The timings for 320 MBaud, 160 MBaud, and 80 MBaud can be found in separate tables below:  

BinduPriya_G_3-1659079728668.png
BinduPriya_G_5-1659079778750.png

 

The following figure provides a good overview of the bus utilization as a function of the total transferred payload at different baud rates:

BinduPriya_G_2-1659079636283.png

 

Target setup time overhead on streaming transaction

To set up a streaming transaction, both the initiator and target need to be configured for the transfer. For setting up the target, typically the following registers need to be written:

  • Configuration register CFG
  • Target Streaming Start Address register TSSA0
  • Target Stream Frame Count register TSFC
  • Miscellaneous Flags Set register MFLAGSSET

From a software perspective it is useful, if the initiator takes full control of setting up the complete streaming transaction. For this purpose, these registers can be set up on the target remotely, by the initiator, using four HSSL WRITE-commands.

These four HSSL WRITE-commands from the initiator to the target consume a considerable time overhead for every streaming transaction. Therefore, this option and the impact on net bus utilization is considered here:  

The typical timing, to set up and execute four HSSL WRITE-commands are:

BinduPriya_G_1-1659079581685.png


Note :         The timings above also include the typical register access time of the local CPU (each ~0,35 µs) to the HSSL-registers needed to set up the HSSL WRITE-command itself on the initiator. In general, those timings depend on the software architecture and have an application-specific nature. The timings provided here do serve as a good estimation or indication for both the TC3xx and TC4xx families.

Now, taking the overhead of the configurational HSSL WRITE-commands from the initiator to target into account, the following net bus utilization can be reached:

BinduPriya_G_0-1659079492112.png

 

 

0 Likes
Contributors