Form, function, and the giant gulf between drawing a picture and understanding the world

Drawing photorealistic images is a major accomplishment for AI, but is it really a step towards general intelligence? Since DALL-E 2 came out, many people have hinted at that conclusion; when the system was announced, Sam Altman tweeted that “AGI is going to be wild”; for Kevin Roose at The New York Times, such systems constitute clear evidence that “We’re in a golden age of progress in artificial intelligence”. (Earlier this week, Scott Alexander seems to have taken apparent progress in these systems as evidence for progress towards general intelligence; I expressed reservations here.)

In assessing progress towards general intelligence, the critical question should be, how much do systems like Dall-E, Imagen, Midjourney, and Stable Diffusion really understand the world, such that they can reason on and act on that knowledge?