High performing deep neural networks come at the cost of computational complexity that limits its practicality for deployment on portable devices. We propose Low-Rank Transformer (LRT), a memory-efficient and fast neural architecture that significantly reduces the parameters and boosts the speed in training and inference for end-to-end speech recognition. Our approach reduces the number of parameters of the network by more than 50% parameters and speed-up the inference time by around 1.26x compared to the baseline transformer model. The experiments show that LRT models generalize better and yield lower error rates on both validation and test sets compared to the uncompressed transformer model. LRT models outperform existing works on several datasets in an end-to-end setting without using any external language model and acoustic data.