Hackathon of IT developers
"Digital solutions for optical recognition"

About project

The company’s development team participated in a hackathon from INTER RAO Energetika and was awarded a special nomination "For the most innovative solution".

The organizers allocated 3 months for the implementation of the document recognition system solution. The Mindset team proposed their own solution using the "Transformers" models.

Transformer is an architecture of neural networks based on the attention mechanism proposed in the 2017 article "Attention Is All You Need". For transformer processing, the text is converted into a sequence of so—called tokens, which, in turn, are converted into numeric embedding vectors. The advantage of the transformers is that they don’t have recurrent modules and therefore require less time training than architectures such as RNN, LSTM and T. p. for through parallelization. Various versions of transformers have become widespread as the basis of large language models (LLM) — GPT, Claude, LLAMA and others.

Creating a document recognition system.

Thanks to a separate nomination, several meetings were held following the hackathon with potential clients.

Task

Business effect

Document recognition using Transformer models, Donut models, Ureader for document recognition, as well as a Streamlit-based interface.

Decision

Technologies

A resource for demonstrating the visual component after learning a neural network. Thanks to Streamlit, the customer can test the neural network before launching the service. Thus, the product can be modified if the results are not satisfactory.

Streamlit

This is a model that can be used to extract text from a given image. This can be useful in various scenarios, for example, when scanning receipts.

Donut Model

This is a study in the field of universal language understanding based on the Multimodel Large Language Model (MLLM), which does not use optical character recognition (OCR). She is able to understand text that occurs visually, for example, on documents, web pages and photographs.

UReader Model

Stages of development

1.

Designing an MVP

3.

Interface implementation

2.

Revision of the solution

Project Features

Documents with markup in the form of a table
The only ones who have applied the multimodal model
The models achieved good results without a sufficient amount of dataset

The project team

Frank Sh.

Manager

Evgeniy M.

Analyst

Victor Sh.

Analyst

Developer

Nikolay D.

Analyst

Mikhail V.

Areas of use

To anyone who needs document recognition as a business process.

We're in touch!

Email us for cooperation or if you have any questions.

Hackathon of IT developers "Digital solutions for optical recognition"

About project

Hackathon of IT developers
"Digital solutions for optical recognition"