OpenEuroLLM: Europe’s Ambitious Path to Digital Sovereignty

The European Union embarks on a groundbreaking journey to develop truly open-source Large Language Models (LLMs) through the OpenEuroLLM project. This ambitious initiative aims to create a series of foundation models for transparent artificial intelligence (AI) covering all EU languages. Scheduled to conclude in late 2025, OpenEuroLLM stands as a precursor to the broader EuroHPC project, which boasts a budget of approximately €7 billion. The project's goal is to enhance digital sovereignty by building open foundation LLMs designed specifically for European needs.

The project's partners include EuroHPC supercomputer centers in Spain, Italy, Finland, and the Netherlands. Additionally, academia and research organizations from Czechia, the Netherlands, Germany, Sweden, Finland, and Norway are actively involved. Corporate entities such as AMD-owned AI lab Silo AI, Aleph Alpha, Ellamind, Prompsit Language Engineering, and LightOn are also key contributors. The OpenEuroLLM project focuses on creating a core multilingual LLM designed for general-purpose tasks where accuracy is crucial.

OpenEuroLLM's budget primarily covers human resources and development efforts. The project may still welcome new participants from EU organizations under the EU funding program. However, some training data might remain confidential, although auditors can request access. The project's first versions are anticipated by mid-2026, with final iterations expected by 2028.

Jan Hajič, a key figure in the project, emphasizes the importance of open data and compliance with AI regulations. He acknowledges challenges in achieving openness but remains optimistic about the project's potential impact.

"We hope that most of the data [will be open], especially the data coming from the Common Crawl," – Jan Hajič

Hajič outlines the project's commitment to high-quality models and adherence to European standards.

"We want to have it as small but as high-quality as possible. We don’t want to release something which is half-baked, because from the European point-of-view this is high-stakes, with lots of money coming from the European Commission — public money." – Jan Hajič

The project aims to address language diversity and resource scarcity while maintaining cultural representation.

"That is the goal, but how successful we can be with languages with scarce digital resources is the question," – Jan Hajič

The focus on true benchmarks for various languages ensures fairness and accuracy in model development.

"But that’s also why we want to have true benchmarks for these languages, and not to be swayed toward benchmarks which are perhaps not representative of the languages and the culture behind them." – Jan Hajič

OpenEuroLLM aligns with Europe's recent AI successes driven by small, focused teams like Mistral AI and LightOn.

"Europe’s recent successes in AI shine through small focused teams like Mistral AI and LightOn — companies that truly own what they’re building," – Stasenko

These teams demonstrate responsibility in financial decisions, market strategies, and reputation management.

"They carry immediate responsibility for their choices, whether in finances, market positioning, or reputation." – Stasenko

The overarching goal is to maintain maximum openness within regulatory constraints.

"The goal is to have everything open. Now, of course, there are some limitations," – Jan Hajič

Hajič highlights the potential use of diverse data sources under European copyright directives while respecting distribution limitations.

"We want to have models of the highest quality possible, and based on the European copyright directive we can use anything we can get our hands on. Some of it cannot be redistributed, but some of it can be stored for future inspection." – Jan Hajič

Despite challenges in securing participation from certain entities, OpenEuroLLM remains focused on generative LLM development.

"I tried to approach them, but it hasn’t resulted in a focused discussion about their participation," – Jan Hajič

Tags

Leave a Reply

Your email address will not be published. Required fields are marked *