12 December 2023

A group of researchers is working on a Polish version of ChatGPT, which will be better and completely free.

The existing AI systems, such as ChatGPT or Google’s Bardhave have two general problems: they cost money to use and they are closed (meaning that their algorithms cannot be examined or modified by users), but also one specific to Poland: they have been trained on little Polish language content, which leads to more errors in Polish responses compared to English.

Image by Freepik

Therefore, on 29 November 2023, on the eve of chatGPT’s first birthday, six, Poland’s leading research units in the field of artificial intelligence and linguistics: Wrocław University of Technology (consortium leader), the National Research Institute NASK, the Information Processing Centre – National Research Institute (OPI PIB), the Institute of Computer Science Foundations of the Polish Academy of Sciences, the University of Łódź and the Institute of Slavic Studies of the Polish Academy of Sciences have signed an agreement to create an LLM model trained on Polish-language content.The consortium PLLuM has a common goal: to create the first open large-scale Polish language model and an intelligent assistant using this model. The entire project is to be carried out in accordance with good practices of ethical and responsible artificial intelligence, including data representativeness, transparency and fairness.

“We suspect that ChatGPT has not seen much Polish in relation to other languages during its training. Consequently, there is a good chance that it is overwriting some knowledge of Polish culture, customs and facts with data from other languages when preparing answers. In the course of the tests, we noticed that this is especially true of Polish culture and history, and that it also makes some grammatical and stylistic errors,” explains Dr Jan Kocoń of the Department of Artificial Intelligence. “It is in our interest to control this and to have control over the information that is related to our country.”

Training PLLuM with “a significantly greater share of texts originally written in Polish and containing information about Poland (Polish science, art, history, law, economy and others) will increase the visibility of our language and culture, which are noticeably marginalised in currently available models”, adds NASK.

“We already have almost 300 gigabytes of text collected from various sources, and this number is growing all the time,” said Dr Jan Kocoń

The model is expected to analyse and cite Polish works much more readily, making the results more tailored to Polish users. This will be useful for scientists and entrepreneurs, among others. For the public, PLLuM is to provide free access to innovative solutions, including an intelligent assistant. The model is to formulate questions naturally, which should resemble a conversation with an official. Perhaps in the future, bureaucracy will be moved online and visits to the office will no longer be associated with standing in queues.

The team behind PLLuM plan to have the first version available for open testing in the first half of next year. They say that the project is to be carried out in accordance with ethical and responsible AI practices, including keeping the data representative, transparent and fair.