Efficient Execution of Deep Learning Kernels on Long Vector Machines

During my first year of Master's degree, I worked as a research intern for 3 months at Barcelona Supercomputing Center (BSC). BSC is part of the European Processor Initiative (EPI) and the team I worked with was working on the conception of a RISC-V processor with a long SIMD vector length. In order to benchmark the new capacities of such a processor, efficient libraries should be designed for applications such as linear algebra or deep learning.

The purpose of my work was to design efficient vectorized algorithms for deep learning kernels such as pooling, ReLU and batch normalization. The motivation of my internship was to contribute to Alexandre De Limas Santana's PHD, which is about the vectorization of the convolution operator for the long SIMD processor developped at BSC.

Writing vectorized kernels for long SIMD architectures

SIMD architectures are very popular: most of recent processors already work with SIMD instructions usually with a fixed vector length between 64 and 512 bits. However, long SIMD architectures are relatively new and the design of efficient applications on such platforms presents some challenges.

Some of the challenges I identified are:

Using an appropriate tensor format in memory: loading contiguous data is faster than gathering data from sparse (or strided) memory locations
Keeping a good data locality in order to decrease the number of memory accesses
Maximizing the use of vector length in order to benefit for full SIMD capabilities
Compile-time optimizations such as loop-unrolling allow to further optimize the target application, with knowledge about the architecture

By following these principles, I managed to obtain significant speedups (up to $60\times$ faster for batch normalization) of the vectorized implementations compared to sequential implementations. My implementations were written with the aim of being added to BSC's RISCVV extension of OneDNN.

Additionnal resources

If you want to learn more about my work, here are some useful resources:

Special thanks

I am grateful to my supervisors Marc Casas and Alexandre De Limas Santana for allowing me to work on such an interesting project.