This leaves out a bunch of prior innovations that *clearly* inspired the transformer authors.
Chief among them is the whole idea of attention, which was popularized by "Neural Machine Translation by Jointly Learning to Align and Translate" by@DBahdanau, @kchonyc, and Yoshua…