LIBRA: a new benchmark for testing AI endurance

Russian Scientists Develop Unique AI Test for Long Texts

Russia has developed its first comprehensive testing suite for artificial intelligence capable of handling large volumes of text. The new benchmark, LIBRA (Long Input Benchmark for Russian Analysis), is designed to assess how well modern language models can truly ‘retain in memory’ and analyze texts comparable in size to an entire book. Until now, there were no similar tools for the Russian language, and existing tests did not allow for objective comparison between neural networks.

LIBRA consists of 18 tasks, grouped by difficulty level, and covers a wide range of challenges—from finding specific information to carrying out complex logical and mathematical reasoning. This approach helps identify the strengths and weaknesses of different models and makes it possible to see exactly where artificial intelligence starts to ‘lose track’ in long documents.

LIBRA Structure

LIBRA is based on a multi-level testing principle. The first group of tasks requires finding unique phrases in a massive text—the classic ‘needle in a haystack’ challenge. The model must quickly and accurately locate the necessary information among thousands of lines. The next level demands not just finding data but providing meaningful answers to questions based on the content of the document.

The third category of tasks requires the neural network to analyze and correlate scattered facts found throughout the text. Superficial searching is not enough here—a true analytical approach is needed. The most challenging tests involve logic and mathematics: artificial intelligence must not only understand what it reads, but also draw conclusions based on the entire context.

Openness and accessibility

LIBRA is designed as an open platform for researchers and developers. All tasks, datasets, and evaluation tools are available on a public leaderboard, allowing for the comparison of results from different models in a fair and transparent environment. This approach fosters healthy competition and accelerates the development of Russian-language long-text processing technologies.

Maria Tikhonova, head of SberAI and an associate professor at HSE, emphasizes the importance of open tools for collaboration in an era of rapid AI development. According to her, LIBRA is not just a set of tasks but a comprehensive “sandbox” for testing and improving neural networks.

Test results

Seventeen popular language models have already been tested on LIBRA. The results were telling: even the most advanced systems begin to lose accuracy as text volume increases. The top performer among the tested models was GPT-4o, while the best open-source solution was GLM4-9B-Chat, available to Russian developers.

LIBRA lead developer Igor Churin notes that the limited ‘context window’ long hindered the implementation of LLMs (Large Language Models) in real business processes and scientific research. The new benchmark provides a quantitative assessment of how models handle tasks that require analyzing tens of thousands of tokens—from lengthy articles to entire books.

Features and Prospects

LIBRA stands out not only for its scale but also for its unique approach to creating tasks. Fourteen out of eighteen tests were specifically developed for this project, based on open Russian-language sources. This takes into account the specifics of the language and cultural context, which cannot be achieved by using translated datasets.

Aidar Bulatov from the MIPT Laboratory of Neural Systems and Deep Learning points out that before LIBRA, Russian developers lacked a unified standard for testing models on long texts. Now, everyone has the opportunity to objectively compare their solutions and identify areas for further development.

The project team plans to expand the benchmark by adding new task types and text domains. This will enable even deeper analysis of artificial intelligence capabilities and help identify its weaknesses.

In case you didn’t know, SberAI is a division of Sber focused on artificial intelligence research and development. Maria Tikhonova, who leads this department, is actively involved in building open platforms to advance AI in Russia. Igor Churin is a leading language model specialist and one of the founders of the LIBRA project. MIPT (Moscow Institute of Physics and Technology) is one of the key scientific partners supporting the initiative for the advancement of domestic artificial intelligence technologies.

Fernando Molina 01.12.2025 22:08

6 4 minutes read