Ley de IA, un ex-ministro francés, Microsoft y los derechos de autor

What do the AI Act, a former French minister, Microsoft, and copyright have to do with each other?

During the AI Act negotiations last December, a lot of things happened that most people didn’t know or didn’t find out about. You will say that this is all water under the bridge, but in a way we are already “suffering” the consequences.

The original proposal of the AIA (AI Act) classified AI systems as risks (in fact, that is what it regulates, the risk of AI systems). Prohibited AI systems, high risk AI systems, medium risk AI systems, and low or no risk AI systems.

What the law does is regulate high risk AI systems. I will not dwell here, but on the arrival of generative AI and the uproar it caused in the regulatory community, because if we left this classification, LLMs (Large Language Models) like ChatGPT, would be practically unaffected by this classification.

Those who would suffer the consequences would be the European SMEs that would like to integrate this tool into their business, for example through the ChatGPT API. In that case, the full weight of the law would fall on them, drowning them in bureaucracy and costs, and threatening their very existence.

So what to do? A solution to this problem had to be found. It was considered not to regulate generative AI, but that would have caused a social and political uproar. And if it was regulated, new rules had to be introduced into the law to adapt to this new reality.

France, Germany and Italy are rebelling against regulation and threatening to boycott the law. The reason, France and Germany are developing their own Generative AI tools and the AIA could kill them. France is developing Mistral https://mistral.ai/ and Germany is developing Aleph Alpha https://aleph-alpha.com/.

Let’s stop in France for a moment, because there is enough to write a novel about. One of the founders of Mistral is called Cédric O and he was Secretary of State for the Digital Sector 2019-2022. During his political tenure, he defended tooth and nail the need to regulate technology.

Well, he went further, he talked about regulating oligopolies rather than technology, business models, in defense of the public interest.

https://www.euronews.com/next/2021/06/18/vivatech-2021-cedric-o-says-tech-oligopoly-must-be-regulated-to-defend-the-public-interest

But like Groucho Marx: “These are my principles, if you don’t like them, I have others”. So Cédric O pulls the agenda and warms the ear of his former boss. What class, Cédric!

https://www.politico.eu/article/france-warns-eu-parliament-against-killing-a-european-chatgpt/ https://www.ft.com/content/9339d104-7b0c-42b8-9316-72226dd4e4c0

At this point in the negotiations between the European Parliament, the European Commission and the Council (currently chaired by Spain), things were heating up and a legal alternative had to be found, so an a la carte Generative AI regulation was literally created.

What was the final result? The consolidated version of the AI Act provided for a classification different from the traditional categories of high, medium or low risk, and entailed a different and general set of obligations contained in Title VIIIA – Articles 52a – 52e.

The classification system for General Purpose AI Models (GPAI) was divided into three levels: (i) standard, (ii) open license, and (iii) systemic risk (based solely on computational power).

Article 52a AIA says that General Purpose AI Models (GPAI) pose a systemic risk if […] and I go to point 2: “if the cumulative amount of computation used for their training, measured in floating point operations (FLOPs), is greater than 10^25”.

Why use FLOPs as a decision parameter? Because of the belief that greater computational resources indicate more sophisticated models, which may have broader societal implications.

Again, we fall into technology regulation. Well, not even that. We are regulating based on computational resources, ignoring (i) the application context, (ii) the model architecture, and (iii) the quality of training, not just the amount of computational resources used.

But there is another piece of information that concerns me as much or more than the previous one. Using the 10^25 FLOPs threshold as a risk parameter is VERY questionable. LLMs with 10^24 or 10^23 FLOPs can be just as risky (e.g. GPT-3; Bard).

And because the trend is to reduce the size of LLMs while maintaining high performance and associated risks, as in the case of Mistral’s Mixtral 8x7B model.

All in all, the law is passed, much euphoria in Europe, pictures with thumbs up, European joy and celebration for being the first law regulating AI. Better than nothing, of course, but my job is not to say that, but to advise and protect.

The Mistral soap opera is not over. Two months after the AIA was passed, Microsoft announced that it would invest 15 million euros in Mistral. Mistral played the dirty game while smoking a cigar to our health.

https://www.theverge.com/24087008/microsoft-mistral-openai-azure-europe

Now, let’s put all this circus aside and get down to the story of copyright law, which is leading authors down the path of bitterness. And what does Mistral have to do with it? You might ask yourself. Wait, wait…

The lawyers and developers of Generative AI tools are desperately trying to find a legal and/or technical formula that will save them from all the lawsuits against them for copyright infringement.

How? By using synthetic data, i.e. data generated by another model, which should not be protected by copyright. Specifically, by using a dataset called Cosmopedia, which contains synthetic textbooks, blog posts, stories, and WikiHow articles.

The dataset contains 30 million files and 25 million tokens. It is the largest open synthetic dataset to date, and has been released under a license that allows commercial use.

But this dataset was created using a different model that was trained on massive amounts of copyrighted works without permission. And which model was it? Not surprisingly: Mistral’s Mixtral 8x7B.

And the AI Act does not consider it a systemic risk AI model. That concludes this story of a situation that has only just begun and that we are trying to figure out how to solve.

As always, thank you for reading.

Leave a Reply

Your email address will not be published. Required fields are marked *