(Top)

State-of-the-art AI capabilities vs humans

How smart are the latest AI models compared to humans? Let’s take a look at how the most competent AI systems compare with humans in various domains. The list below is regularly updated to reflect the latest developments.

Last update: 2024-04

Superhuman (Better than all humans)

  • Games: For many games (Chess, Go
    , Starcraft, Dota, Gran Turismo
    etc.) the best AI is better than the best human.
  • Memory: An average human can remember about 7 items (such as numbers) at a time. Gemini 1.5 Pro can read and remember 99% of 7 million words
    .
  • Thinking speed: AI models can read thousands of words per second, and write at speeds far surpassing any human.
  • Learning speed: A model like Gemini 1.5 Pro can read an entire book in 30 seconds. It can learn an entirely new language and translate texts in half a minute.
  • Amount of knowledge: GPT-4 knows far more than any human, its knowledge spanning virtually every domain, even remembering things like URLs.
  • Storage efficiency: GPT-4 has about 1.7 trillion parameters
    , whereas humans have about 100 to 1000 times as much
    . However, GPT-4 knows thousands of times more, storing more information in a smaller amount of parameters.

Better than most humans

Worse than most humans

  • Saying “I don’t know”. Virtually all Large Language Models have this problem of ‘hallucination’, making up information instead of saying it does not know. This might seem like a relatively minor shortcoming, but it’s a very important one. It makes LLMs unreliable and strongly limits their applicability. However, studies show
    that larger models hallucinate far less than smaller ones.
  • Being a convincing human. GPT-4 can convince
    54% of people that it’s a human, but humans can do so 67% of the time. In other words, GPT-4 doesn’t yet consistently pass the Turing test.
  • Dextrous movement. No robots can move around like a human can, but we’re getting closer. The Atlas robot can walk, throw objects and do somersaults
    . Google’s RT-2
    can turn objectives into actions in the real world, like “move the cup to the wine bottle”. Tesla’s Optimus robot can fold clothes
    and Figure’s biped can make coffee
    .
  • Self-replication. All lifeforms on earth can replicate themselves. AI models could spread from computer to computer through the internet, but this requires a set of skills that AI models do not yet possess. A 2023 study
    lists a set of 12 tasks for self-replication, of which tested models completed 4. We don’t want to find out what happens if an AI model succeeds in spreading itself across the web.
  • Continual learning. Current SOTA LLMs separate learning (‘training’) from doing (‘inference’). Although LLMs can learn using their context, they cannot update their weights while being used. Humans learn and do at the same time. However, there are multiple potential approaches towards this
    . A 2024 study
    detailed some recent approaches for continual learning in LLMs.
  • Planning. LLMs are not yet very good at planning (e.g. reasoning about how to stack blocks on a table)
    . However, larger models do perform way better than smaller ones.

The endpoint

As time progresses and capabilities improve, we move items from lower sections to the top section. When some specific dangerous capabilities are achieved, AI will pose new risks. At some point, AI will outcompete every human in every metric imaginable. When we have built this superintelligence, we will probably soon be dead . Let’s implement a pause to make sure we don’t get there.