Transformer is an architecture of neural networks based on the attention mechanism proposed in the 2017 article "Attention Is All You Need". For transformer processing, the text is converted into a sequence of so—called tokens, which, in turn, are converted into numeric embedding vectors. The advantage of the transformers is that they don’t have recurrent modules and therefore require less time training than architectures such as RNN, LSTM and T. p. for through parallelization. Various versions of transformers have become widespread as the basis of large language models (LLM) — GPT, Claude, LLAMA and others.