Abstract
Salsa20 is a stream cipher that generates a 64-byte encryption/decryption key from a 64-byte input containing a 32-byte key, 8-byte nonce (pseudo-random bytes), 16-byte constants (used for expanding the key), and 8-byte block counters (incremented for each 64-byte block of a message). The input is passed to a pseudo-hash function which applies twenty rounds of addition, rotation, and xor to the input. The output of the pseudo-hash function is combined with previous input and the result becomes the encryption key for the current message block. The encryption key can also be used as a decryption key.
In this project, the performance of a hardware solution and a software solution of the Salsa20 algorithm is investigated. The performance of the two solutions is compared when encrypting a large text file. A pipelined datapath is designed for Salsa20 and described using Verilog. Altera tools are used to synthesize the Verilog description of the hardware. The propagation delay obtained from the synthesis tools is used in a Simics simulation for performance analysis. An existing software solution for Salsa20 is used in comparison to the hardware solution.
Both the hardware and the software solutions are used in a simulated machine to encrypt a large text file. The CPU statistics of the simulated machine is used to compare the performance of the two solutions. The performance data shows that the Salsa20 HW is about 70 times faster than the Salsa20 SW. The key generation is overlapped with the encryption of 64 bytes of the text (except for the first encryption key). There is an additional overhead incurred when initializing the Salsa20 HW, incrementing the counters, and getting the encryption key from the hardware for every 64 bytes of the text. However, this overhead is small compared to the time it takes to encrypt 64 bytes.