The Fall of RNN / LSTM

Good article on “hierarchical neural attention encoders” the next evolution in neural network designs.

Then in the following years (2015–16) came ResNet and Attention. One could then better understand that LSTM were a clever bypass technique. Also attention showed that MLP network could be replaced by averaging networks influenced by a context vector.