A pessimistic vision of ChatGPT to be optimistic

What is a prosperous society?

Nick Hanauer and Erick Beinhocker: It is the availability of things that create prosperity: safe food, antibiotics, air conditioning, the ability to travel…

In short, having solutions to human problems is what makes us wealthier, not the money in circulation.

If we apply this to ChatGPT, does it solve human problems?

Le’s start for a short reference to the history of neural networks.

Thirty years ago these two papers were published:

The second one says which problems the neuronal nets have:

(1) it cannot represent certain words

(2) it cannot learn many rules

(3) it can learn rules found in no human language

(4) it cannot explain morphological and phonological regularities

(5) it cannot explain the differences between irregular and regular forms

(6) it fails at its assigned task of mastering the past tense of English

(7) it gives an incorrect explanation for two developmental phenomena: stages of overregularization of irregular forms such as bringed, and the appearance of doubly-marked forms such as -ated and

(8) it gives accounts of two others (infrequent overregularization of verbs ending in t/d, and the order of acquisition of different irregular subclasses) that are indistinguishable from those of rule-based theories.

Was adding more hidden layers the solution? scientists did not buy this idea.

What happens when you try to do something over and over again, with more data and a huge number of parameters, and you still get the same errors? that the technology is in a strain.

No hay texto alternativo para esta imagen

Let’s analyze what’s important here.

1. The purpose of technology

David Marr, one of the heroes of cognitive science. He is well-known for the three level hypothesis which any information processing system must be understood at three levels:

(i) The Computational Theory Level of the problem

(ii) Representation and algorithm

(iii) Hardware implementation

No hay texto alternativo para esta imagen

(i) The computational theory level is what I want to focus on. It answers the why.

What is the goal of the computation? That is:

(a) what we look at computational theory

(b) what are the purposes of the technology

(c) why is it appropriate to compute certain things rather than other things.

(ii) Representation and algorithm. It answers the how.

(a) how would that be done,

(b) what is the representation for the input and output? And

(c) what is the algorithm for the transformation.

(iii) Hardware implementation. It answers the what.

How can the representation and algorithm be realized physically? 

In AI, very surprisingly, there is very little computational theory in Marr’s sense.

For example, in AI textbooks they often decline to define the problem of intelligence. What they say, it’s something like getting machines to do things whatever it is that people do.

Has OpenAI defined the why behind ChatGPT? No. Just the How and the What.

2. What do we need to make sense of the 3 levels of the hierarchy of understanding?

Judea Pearl tells us that the most important things here are the input and the output. Where the information is coming from and what question do we want to answer with that information.

With Deep Learning we need DEEP UNDERSTANDING and today there are no AI systems that respond to this reality.

No hay texto alternativo para esta imagen
Judea Pearl is a mathematician, computer scientist and philosopher.

Data as a window through which we try to interrogate the reality of the world around us.

Causal reasoning (i.e. human reasoning) can be very helpful in this regard.

Humans draw conclusions (inferences) through causality, and this is how we have constructed the world in our minds:

(i) What we look for

(ii) How to use it

(iii) How to use it to ultimately communicate with humans.

3. Is it true that this technology comes close to reasoning like a human being?

I still find it quite shocking that there are people who think that ChatGPT can reason like a human being and still don’t blush.

So let’s cut to the chase, what makes us different?

Gary Marcus has been saying for many years that large language models cannot do the following:

No hay texto alternativo para esta imagen
GARY MARCUS is a leading voice in the field of AI. He is a scientist, best-selling author and serial entrepreneur. He has anticipated many of today’s limitations decades in advance, and for his research on human language development and cognitive neuroscience.

(i) ABSTRACTION. Is a key part of human cognition and current AI still struggles with it.

(ii) REASONING. LLM store patterns that they can reapply to new inputs which is to say it works for problems that follow a structure that the model was seen before but it doesn’t necessarily work on new problems.

(iii) COMPOSITIONALITY. Humans understand language in terms of wholes composed of parts. Current AI continues struggle with it. 

(iv) FACTUALITY. They cannot be updated incrementally, you can’t just give them a new fact and have them update. They need to be typically fully retrained to incorporate new knowledge, and this is a serious problem.

Even Yann Lecun, Head of AI at Meta and the world’s leading LLM visionary. Winner in 2018 of the AI equivalent of the Nobel Prize, the Turing Award, and the Princess of Asturias Award, says about Large Language Models:

No hay texto alternativo para esta imagen
No hay texto alternativo para esta imagen

This is exactly what we need to do: specify and enumerate what cannot be done with ChatGPT because it can lead to catastrophic situations for individuals, groups of individuals and society.

4. OpenAI’s paper

No hay texto alternativo para esta imagen

“GPT-4 is a a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers.”

OpenAI did not provide much information about GPT-4 in its paper – not even the size of the model.

It insisted on its performance on professional licensing exams and other standardised tests.

We don’t know the answer to how it will affect certain professions but OpenAI may have broken the cardinal rule of Machine Learning: don’t test on your training data.

There is a bigger problem. The way linguistic models solve problems is different from the way humans solve problems, so these results tell us very little about how a bot will perform when faced with the real problems that professionals face.

A lawyer’s job is not to answer bar exam questions all day long.

5. What are the risks of ChatGPT that have not yet been fixed?

No hay texto alternativo para esta imagen
Page 44

(i) Hallucinations.

Closed-domain hallucination: includes information that was not in the article/journal/post, etc.

Open-domain hallucinations: when the model confidently provides false information about the world without reference to any particular input context.

No hay texto alternativo para esta imagen
No hay texto alternativo para esta imagen

(ii) Harmful content

No hay texto alternativo para esta imagen

Not taking account the context the harm may appear is really serious, is losing the reality where harmful situations are taking place.

(iii) Privacy

Please, read carefully the highlighted part of the text:

No hay texto alternativo para esta imagen
No hay texto alternativo para esta imagen

(iv) Cibersecurity

No hay texto alternativo para esta imagen

(v) Accuracy

Pay attention to this diagram:

No hay texto alternativo para esta imagen

This is what the paper says in its “Limitations” section on page 10:

“Despite its capabilities, GPT-4 has similar limitations as earlier GPT models. Most importantly, it still is not fully reliable (it “hallucinates” facts and makes reasoning errors). Great care should be taken when using language model outputs, particularly in high-stakes contexts, with the exact protocol (such as human review, grounding with additional context, or avoiding high-stakes uses altogether) matching the needs of specific applications. See our System Card for details.

GPT-4 significantly reduces hallucinations relative to previous GPT-3.5 models (which have themselves been improving with continued iteration). GPT-4 scores 19 percentage points higher than our latest GPT-3.5 on our internal, adversarially-designed factuality evaluations (Figure 6).”

Yes, but they are still at 80%.

And there are still professionals/charlatans who dare to say that this tool can be used to help people in actions that require a lot of professional knowledge, human interaction, empathy, “clinical eye”? I am truly amazed at what greed can do to people.

7. What are the ethical and social risks of Big Language Models?

DeepMind published a paper in Dec 2021 about the ethical and social risks of harm from LLMs. They outlined six specific risk areas:

(i) Discrimination, Exclusion and Toxicity

(ii) Information Hazards

(iii) Misinformation Harms

(iv) Malicious Uses

(v) Human-Computer Interaction Harms

(vi) Automation, Access, and Environmental Harms.

No hay texto alternativo para esta imagen
Image by DeepMind

8. The breakdown of vigilance and its consequences

Let’s talk about CharGPT and vigilance, or lack of it.

The breakdown of vigilance during prolonged visual search by Norman Mackworth.

The cognitive psychologist Norman Mackworth made a fundamental and essential discovery about the use of artificial intelligence:

When humans are given a repetitive task where we don’t have to do anything, we end up disengaging from the activity completely.

No hay texto alternativo para esta imagen
Norman H. “Mack” Mackworth (1917-2005) was a British psychologist and cognitive scientist and founder of modern vigilance research.

The term “vigilance” means paying close attention for long periods of time. Vigilance requires hard mental work and is stressful.

Something we humans tend to avoid.

CNET editors learned this when they published a series of ChatGPT-generated articles that turned out to be full of subtle but important errors in 41 of the 77 articles the chatbot wrote.

All of them had been approved by editors who weren’t paying enough attention.

In Mackworth’s terminology, the editors’ vigilance was inadequate or non existent.

We must be vigilant, we cannot let a technology with so many shortcomings do our work unsupervised.

9. Responsible AI

No hay texto alternativo para esta imagen

Responsible AI is Ethical AI built atop structural discrimination. It doesn’t seek to address the root of the problems, it seeks to apply fixes to problematic outcomes.

Ethical AI seek to address the root of the problem in order to prevent structural discriminations and problematic outcomes.

We thought Microsoft have learned their lesson from the Tay failure.

No hay texto alternativo para esta imagen
No hay texto alternativo para esta imagen

Excuse me?

No hay texto alternativo para esta imagen

10. Reactions to the current situation in different parts of the world

March 29th 2023. Open letter calling for a 6-month moratorium on research experiments by tech giants, specifically on the development of Large Language Models (LLM), a technology that has known risks with no known solutions. Elon Musk, Yuval Noah Harari, Steve Wozniak signed the letter along with thousands of others.

No hay texto alternativo para esta imagen

I signed the letter and the reason why was I saw an opportunity for all of us to speak about the situation and find solutions. I don’t see any realistic chance of implementing the moratorium.

The letter does not ask a ban on research, it asked a shift on research.

It asks: “AI research and development should be refocused on making today’s powerful, state-of-the-art systems more accurate, safe, interpretable, transparent, robust, aligned, trustworthy, and loyal.”

“We have a perfect storm of corporate irresponsibility, widespread adoption, lack of regulation and a huge number of unknowns”.

Gary Marcus

30 March 2023. The European Consumers’ Organisation (BEUC) calls on the EU to launch an investigation into ChatGPT and similar chatbots, following a complaint by the US civil society group CAIDP to the US Federal Trade Commission. USA v ChatGPT-4.

No hay texto alternativo para esta imagen

CONCERN. The AI Act may not come into force for years.

Ursula Pachl, deputy director-general of the European Consumers’ Organisation (BEUC): “We are not protected from the negative social impacts of this technology”.

March 31st 2023. Italian Data Protection Authority bans ChatGPT because it doesn’t respect GDPR.

Italy’s Garante believes ChatGPT has four problems under GDPR:

(i) OpenAI doesn’t have age control to stop people under the age of 13 from using the text generation system;

(ii) it can provide information about people that isn’t accurate;

(iii) people haven’t been told their data was collected.

(iv) Perhaps most importantly, its fourth argument claims there is “no legal basis” for collecting people’s personal information in the massive swells of data used to train ChatGPT.

11. Let’s focus on the real problems

No hay texto alternativo para esta imagen

Have we asked the people who will be discriminated against how they feel about these measures?

OpenAI used Kenyan workers on less than $2 an hour to make ChatGPT less toxic.

Have we asked them what they think of ChatGPT and its impact on their lives?

Sam Altman, OpenAi’s CEO, finds amusing to divide citizens into first and second class according to their economic status.

No hay texto alternativo para esta imagen

I am more afraid of short to medium term dangers than long term ones. We are talking about: spread of disinformation and the risk that people will rely on these systems for medical and emotional advice, increasing, even more, inequality and and negative social impact that we could never have imagined.

No hay texto alternativo para esta imagen
No hay texto alternativo para esta imagen

And is it now that you stop to think about the problems, when you have created a model to sell to Microsoft for 10 billion dollars by focusing on the HOW and you didn’t even consider the WHY, or at least develop a robust and secure application?

And this is happening because a cultural reason: hierarchy of knowledge where social change is different and completely separate of math, physics, engineering or any related field.

And this is how people in tech companies work, thinking their work is completely disconnected from the social impact because coding is the most important knowledge.

12. Solutions

I see no other solution than for the world’s experts to talk to each other in search of strategic and tactical solutions to the problems these technologies are already causing in society, and to CREATE NEW NARRATIVES because the current ones are not working. Narratives create realities, not the other way round.

No hay texto alternativo para esta imagen
Rasmussen et al., (2018) Conclusions from the Santa Fe Institute Working Group on ‘Envisioning new modes of cultural and technological change’.

It is the mutual interactions between technology, the environment, social institutions, the economy and our culture that shape what it means to be human. And what makes us build robust societies.

Science seeks to find and develop the truth, and narrative gives it meaning. It is the narratives that create realities, and not the other way around.

Science and narrative must communicate and therefore support each other.

13. Conclusions

Neural networks have been making the same mistakes in knowledge of language for 30 years. ChatGPT is no an exception. If these problems have not been solved in 30 years, then this technology is at a dead end. Gary Marcus has been talking about this problem for many years, and he identifies four functions that humans have that LLMs completely lack: abstraction, reasoning, compositionality, and factuality.

This is coupled with the fact that today’s technologies are created without a defined purpose, but are focused on algorithm development and implementation. One of the consequences is that the applications are not developed in a safe and robust way, as in the case of ChatGPT, and we do not know the negative effects and have no legal means to defend ourselves.

In its paper, OpenAI defines twelve unresolved problems that have a direct impact on people’s lives. To list some of them, we talk about hallucinations, privacy issues, harmful content and disinformation, significant cybersecurity limitations, and accuracy issues.

These are the problems of the model itself, but to these we must add the ethical and social problems arising from the misuse of this technology, including discrimination, exclusion and addiction, disinformation damage, malicious use, automation, access and environmental damage.

On top of these modelling and ethical problems, Microsoft has fired its entire Responsible AI team, justifying the wrong answers of its AI as usefully wrong.

Meanwhile, in both Europe and the US, legal reactions and calls for investigations have not been long in coming. Italy has already banned the tool for violating the GDPR, a letter has been signed in the US calling for a six-month moratorium on research into the technology, and the European Consumers’ Organisation (BEUC) is asking the EU to open an investigation into ChatGPT.

But the fact is that we are completely unprotected.

The solution lies in the world’s experts to talk to each other in search new narratives because the current ones are not working. Narratives create realities, not the other way round. We need, urgently, to create a new reality.

Leave a Reply

Your email address will not be published. Required fields are marked *