During my first year of Master's degree, I worked as a research intern for 3 months at Barcelona Supercomputing Center (BSC). BSC is part of the European Processor Initiative (EPI) and the team I worked with was working on the conception of a RISC-V processor with a long SIMD vector length. In order to benchmark the new capacities of such a processor, efficient libraries should be designed for applications such as linear algebra or deep learning.
The purpose of my work was to design efficient vectorized algorithms for deep learning kernels such as pooling, ReLU and batch normalization. The motivation of my internship was to contribute to Alexandre De Limas Santana's PHD, which is about the vectorization of the convolution operator for the long SIMD processor developped at BSC.
SIMD architectures are very popular: most of recent processors already work with SIMD instructions usually with a fixed vector length between 64 and 512 bits. However, long SIMD architectures are relatively new and the design of efficient applications on such platforms presents some challenges.
Some of the challenges I identified are:
By following these principles, I managed to obtain significant speedups (up to faster for batch normalization) of the vectorized implementations compared to sequential implementations. My implementations were written with the aim of being added to BSC's RISCVV extension of OneDNN.
If you want to learn more about my work, here are some useful resources:
I am grateful to my supervisors Marc Casas and Alexandre De Limas Santana for allowing me to work on such an interesting project.