# TRAX, a Tweakable Block Cipher

`Trax-L-17`

(pronounced “T-rax”) is a tweakable block cipher operating on a large 256-bit block. It applies a 256-bit key and 128-bit tweak. To the best of our knowledge, the only
other large tweakable block cipher is `Threefish`

which was used as a building for
the SHA-3 candidate Skein. Unlike this cipher, `Trax-L-17`

uses 32-bit words that
are better suited for vectorized implementation as well as on micro-controllers.
Another improvement of `Trax-L-17`

over `Threefish`

is the fact that we provide strong bounds for the probability of all linear trails and all (related-tweak) differential trails. Because of its Substitution-Permutation Network structure, `Trax`

is indeed inherently easier to analyze. Such a large tweakable block cipher can provide robust authenticated encryption, meaning that it can retain a high security level even in case of nonce misuse or in the presence of quantum adversaries, as argued in Saturnin specification. The performance penalty of such guarantees can be minimized using vectorization and/or parallelism.

A tweakable block cipher lends itself well to a parallelizable mode of operation
such as ΘCB, a variant of OCB which saves its complex overhead needed to
turn a regular block cipher into a tweakable one. Since our block size is equal to
256 bits, attacks relying on collisions obtained via the birthday paradox are non-
issue with `Trax-L-17`

. Some modes such as the Synthetic Counter-in-Tweak
can retain a security level up to the birthday bound in case of nonce misuse. As
suggested in Saturnin specification, using a 256-bit block cipher can also help providing post-
quantum security in cases where the attacker is given a lot of power (e.g. if the
primitive runs on a quantum computer).

## Description

The following figure captures the high-level structure of one step in `Trax-L`

. Here, \(A_c\) denotes the ARX-box Alzette with the constant \(c\); \(\ell'\) is a linear bijection of \((\mathbb{F}_2^{16})^4\) defined as

Below, we provide a reference implementation of `Trax-L-17`

in C.

```
#define ROT(x, n) (((x) >> (n)) | ((x) << (32-(n))))
#define ALZETTE(x, y, c) \
(x) += ROT((y), 31), (y) ^= ROT((x), 24), \
(x) ^= (c), \
(x) += ROT((y), 17), (y) ^= ROT((x), 17), \
(x) ^= (c), \
(x) += (y), (y) ^= ROT((x), 31), \
(x) ^= (c), \
(x) += ROT((y), 24), (y) ^= ROT((x), 16), \
(x) ^= (c)
#define ALZETTE_INV(x, y, c) \
(x) ^= (c), \
(y) ^= ROT((x), 16), (x) -= ROT((y), 24), \
(x) ^= (c), \
(y) ^= ROT((x), 31), (x) -= (y), \
(x) ^= (c), \
(y) ^= ROT((x), 17), (x) -= ROT((y), 17), \
(x) ^= (c), \
(y) ^= ROT((x), 24), (x) -= ROT((y), 31)
#define NSTEPS 17
#define ELL(x) (ROT(((x) ^ ((x) << 16)), 16))
static const uint32_t RCON[8] = { \
0xB7E15162, 0xBF715880, 0x38B4DA56, 0x324E7738, \
0xBB1185EB, 0x4F7C7B57, 0xCFBFA1C8, 0xC2B3293D };
void traxl17_genkeys_ref(uint32_t *subkeys, const uint32_t *key)
{
uint32_t key_[8], tmp;
int b, s; // branch and step counter
memcpy(key_, key, 32);
for (s = 0; s < NSTEPS+1; s++) {
// assign 8 sub-keys
for (b = 0; b < 8; b++)
subkeys[8*s+b] = key_[b];
// update master-key
key_[0] += key_[1] + RCON[(2*s)%8];
key_[2] ^= key_[3] ^ s;
key_[4] += key_[5] + RCON[(2*s+1)%8];
key_[6] ^= key_[7] ^ ((uint32_t) (s << 16));
// rotate master-key
tmp = key_[0];
for (b = 1; b < 8; b++)
key_[b-1] = key_[b];
key_[7] = tmp;
}
}
void traxl17_enc_ref(uint32_t *x, uint32_t *y, const uint32_t *subkeys,
const uint32_t *tweak)
{
uint32_t tmpx, tmpy;
int b, s; // branch and step counter
for (s = 0; s < NSTEPS; s++) {
// add tweak to state if step-counter is odd
if ((s % 2) == 1) {
x[0] ^= tweak[0]; y[0] ^= tweak[1];
x[1] ^= tweak[2]; y[1] ^= tweak[3];
}
// add subkeys to state and execute ALZETTEs
for (b = 0; b < 4; b++) {
x[b] ^= subkeys[8*s+2*b];
y[b] ^= subkeys[8*s+2*b+1];
ALZETTE(x[b], y[b], RCON[(4*s+b)%8]);
}
// linear layer (see Sparkle256 permutation)
tmpx = ELL(x[2] ^ x[3]); y[0] ^= tmpx; y[1] ^= tmpx;
tmpy = ELL(y[2] ^ y[3]); x[0] ^= tmpy; x[1] ^= tmpy;
tmpx = x[0]; x[0] = x[3]; x[3] = x[1]; x[1] = x[2]; x[2] = tmpx;
tmpy = y[0]; y[0] = y[3]; y[3] = y[1]; y[1] = y[2]; y[2] = tmpy;
}
// add subkeys to state for final key addition
for (b = 0; b < 4; b++) {
x[b] ^= subkeys[8*NSTEPS+2*b];
y[b] ^= subkeys[8*NSTEPS+2*b+1];
}
}
void traxl17_dec_ref(uint32_t *x, uint32_t *y, const uint32_t *subkeys,
const uint32_t *tweak)
{
uint32_t tmpx, tmpy;
int b, s; // branch and step counter
// add subkeys to state for initial key addition
for (b = 0; b < 4; b++) {
y[b] ^= subkeys[8*NSTEPS+2*b+1];
x[b] ^= subkeys[8*NSTEPS+2*b];
}
for (s = NSTEPS-1; s >= 0; s--) {
// inverse linear layer (see Sparkle256 permutation)
tmpy = y[0]; y[0] = y[2]; y[2] = y[1]; y[1] = y[3]; y[3] = tmpy;
tmpx = x[0]; x[0] = x[2]; x[2] = x[1]; x[1] = x[3]; x[3] = tmpx;
tmpy = ELL(y[2] ^ y[3]); x[0] ^= tmpy; x[1] ^= tmpy;
tmpx = ELL(x[2] ^ x[3]); y[0] ^= tmpx; y[1] ^= tmpx;
// add subkeys to state and execute inverse ALZETTEs
for (b = 0; b < 4; b++) {
ALZETTE_INV(x[b], y[b], RCON[(4*s+b)%8]);
y[b] ^= subkeys[8*s+2*b+1];
x[b] ^= subkeys[8*s+2*b];
}
// add tweak to state if step-counter is odd
if ((s % 2) == 1) {
y[1] ^= tweak[3]; x[1] ^= tweak[2];
y[0] ^= tweak[1]; x[0] ^= tweak[0];
}
}
}
```

## Performance

In summary, Trax-L-17 can be used in SCT mode to provide 128 bits of
security in case of nonce-misuse, and its large block size can frustrate some
quantum attacks when used in the same mode as `Saturnin`

: it can be used to
offer a very robust authenticated encryption. On a Cortex-M3 microcontroller,
the generation of subkeys, encryption, and decryption has an execution time
of 925, 2435, and 2464 clock cycles, respectively. These results are based on a
standard C implementation, whereby we determined the execution times with
help of the cycle-accurate instruction set simulator of Keil Microvision v5.24.2
using a generic Cortex-M3 model as target device. For comparison, the currently
fastest implementation of `Saturnin`

has a (simulated) encryption time of 5494
cycles and a decryption time of 5489 cycles on a generic Cortex-M3 device. 10
Consequently, `Trax-L-17`

outperforms `Saturnin`

by a factor of more than 2.2.

The use of 32-bit operations implies that it is possible to vectorize the com- putation of several parallel Trax-L-17 instances on many platforms, meaning that its speed can be multiplied whenever e.g. AVX instructions are available.