AI-Generated Images from AI-Generated Alt Text

Shapes that look like letters which are made up of tiny blurry photographs. Dear sighted reader, I want you to read this post without looking at the images. Each has been hidden in a disclosure.

Instead, read the alternative text I provide and visualize how it may look. Then read the automatically generated alternative text, and try to visualize it then. Consider how they differ.

I took the original alt text and the most descriptive auto-generated alt text and fed each into Craiyon and Midjourney, two self-proclaimed AI tools for generating images from nothing more than a text prompt.

Start comparing the images. As you visually compare each, think about how a screen reader user might have benefited from the more descriptive alt text.

Obviously context matters for what alt text you choose, and the alt text might change for an image as it is used in other contexts.

Flat Color Illustration

A flat color illustration
A cartoonish Kawaii slice of toast with happy eyes, open smiling mouth, and reddish cheeks; there is a spatter of blood coming from the top of the toast similar to the Watchmen logo. A cartoonish Kawaii slice of toast with happy eyes, open smiling mouth, and reddish cheeks; there is a spatter of blood coming from the top of the toast similar to the Watchmen logo.
The first is a PNG image, the second is an SVG.

This is the alternative text for the image:

A cartoonish Kawaii slice of toast with happy eyes, open smiling mouth, and reddish cheeks; there is a spatter of blood coming from the top of the toast similar to the Watchmen logo.

I used these tools to automatically generate the following alternative text:

Microsoft Office (did not offer me an option for an SVG, so I made a PNG)
Icon
Microsoft Edge
Appears to be icon, appears to say one
Google Chrome
No description available
Apple iOS VoiceOver Recognition
(for the SVG) An illustration of a person’s face on a white surface. (for the PNG) An illustration of a heart with a face.

I chose the most descriptive auto-generated alternative text and fed it into two image generation tools along with my original alternative text. You can compare and contrast the differences:

My alt text fed into Craiyon for the flat color illustration
A set of 9 images that look slices of bread, most of them darkened as if toasted, all with chaotic spatters of red, smiling mouths, and horror show eyes of black, white, or in one case, an X where an eye should be.
I have never been afraid of toast before. At least not the bread kind.
My alt text fed into Midjourney for the flat color illustration
4 images of toast. Two look like bread with red smears and cartoony happy faces, one looks like someone toasted the bread until it was black and then carved a Jack o' Lantern face, and the last one has four eyes and maybe four mouths with a painted-on grin below them.
Midjourney does not allow “blood” as a seed word, so I changed it to “jelly”. That is what sighted folks referred to the spatter as anyway.
Apple’s alt text fed into Craiyon for the flat color illustration
9 variations on a traditional red heart, each with happy eyes, a goofy smile, open- and closed-mouth, and a couple appear to be wearing glasses. Another appears to have mouse ears?
I went with Apple’s An illustration of a heart with a face. Unlike the effort with the toast, Craiyon went for a happier default expression.
Apple’s alt text fed into Midjourney for the flat color illustration
4 watercolor hearts, each with a face within; one face has deep wells of dark instead of eyes, another is just the nose the rest of the face in shadow, another is a smiling young woman with more hearts on her cheeks, and another looks like someone wearing a red heart N95 mask.
I went with Apple’s An illustration of a heart with a face. Midjourney certainly has a vibe.

Photo

A photograph
Looking across the atrium with four levels of the gallery space / ramp visible. The ramp contains exhibits as well as crowds of people moving among them, with some people leaning over the edge of the ramp wall.
I used the image in a tweet about the Guggenheim, so the venue context was there. Wallygva at English Wikipedia, CC BY-SA 3.0.

This is the alternative text for the image (venue location provided in original surrounding context):

Looking across the atrium with four levels of the gallery space / ramp visible. The ramp contains exhibits as well as crowds of people moving among them, with some people leaning over the edge of the ramp wall.

I used these tools to automatically generate the following alternative text:

Microsoft Office (took two passes)
A group of people in a large room
Microsoft Edge
Appears to be a group of people in a large room
Google Chrome
Appears to be: art gallery which includes interior views as well as a large group of people, (when I arrowed down, then I heard) Solomon R. Guggenheim Museum, The Metropolitan Museum of Art
Apple iOS VoiceOver Recognition
A group of people walking on a white staircase.

I chose the most descriptive auto-generated alternative text and fed it into two image generation tools along with my original alternative text. You can compare and contrast the differences:

My alt text fed into Craiyon for the photo
9 distorted photo-realistic views showing the sweeping curved white concrete levels of the Guggenheim from the multi-story atrium, most of them showing the multi-paned skylight, and a couple showing a seemingly descending spiral of levels. One has what may be an escalator coming from the middle level, and another shows the levels turning back in on each other almost to form a knot.
I am impressed my description made Craiyon think of the Guggenheim without me stating it.
My alt text fed into Midjourney for the photo
4 sedate symmetrical views of a white gallery space with a few people. Each is topped with one, two, or three sweeping arcs of other levels as seen from an atrium, with the viewer standing back from the atrium itself. A couple images have a railing. There is no art visible.
Another case where my description was enough to make the AI, Midjourney, think of the Guggenheim without me stating it.
Google’s alt text fed into Craiyon for the photo
9 views of the broad circular white concrete levels of the Guggenheim as seen from either the top of bottom of the multi-story atrium. Some of the levels break out as bridges to other levels, another is made of the stacked circles of the exterior of the building, another breaks gravity by coming into itself at a 90 degree angle, and yet another spirals into itself like a giant Q.
I concatenated both of Google’s descriptions into one, using art gallery which includes interior views as well as a large group of people, Solomon R. Guggenheim Museum, The Metropolitan Museum of Art. I think if Google had not matched the image with the venue, Craiyon would not have come up with such a good set of images, even if they do not match the view of the original image. Bear in mind my original description did not name the venue.
Google’s alt text fed into Midjourney for the photo
3 images showing a stark lightly colored gallery wall with a massive piece of art; one piece of art is a crowd of people looking at art, another appears to be textured color blocks, and the third is a round gallery space with people at the edges. The fourth image of the set is a wide open room with a high curved ceiling and windows at the floor. People are milling about, but there is no art on the walls.
I concatenated both of Google’s descriptions into one, using art gallery which includes interior views as well as a large group of people, Solomon R. Guggenheim Museum, The Metropolitan Museum of Art. Even with that extra venue context, if you have never seen the Guggenheim then this would not be a good way to convey it.

Digital Illustration

A digital illustration
A photo illustration travel poster showing a cluster of metallic hot air balloons with spheroid gondolas floating above the opaque clouds of Jupiter’s atmosphere. Behind and above the balloons is a sweeping aurora of teal and purple against a black starry sky. The advertisement reads “Experience the mighty auroras of Jupiter” in metallic block text at the bottom of the poster.
From NASA’s Visions of the Future poster series. Courtesy NASA/JPL-Caltech.

This is the alternative text for the image:

A photo illustration travel poster showing a cluster of metallic hot air balloons with spheroid gondolas floating above the opaque clouds of Jupiter’s atmosphere. Behind and above the balloons is a sweeping aurora of teal and purple against a black starry sky. The advertisement reads “Experience the mighty auroras of Jupiter” in metallic block text at the bottom of the poster.

I used these tools to automatically generate the following alternative text:

Microsoft Office
Background pattern
Microsoft Edge
Appears to be background pattern, Appears to say EXPERIENCE THE MIGHTY AURORAS OF JUPITER
Google Chrome
Appears to say: EXPERIENCE THE MIGHTY AURORAS OF JUPITER
Apple iOS VoiceOver Recognition
A screenshot of the video game with text and the image of a hot air balloon. Experience the mighty auroras of Jupiter.

I chose the most descriptive auto-generated alternative text and fed it into two image generation tools along with my original alternative text. You can compare and contrast the differences:

My alt text fed into Craiyon for the digital illustration
9 images, each depicting hot air balloons floating above clouds. The color palette generally matches what I identified, with teals, purples, and blacks prominent. Auroras are visible in most, but some of the skies are white clouds and feel like Earth. None of them has any text.
At least the colors made it through, if not the text.
My alt text fed into Midjourney for the digital illustration
4 images, each with at least one hot air balloon reflecting the teal, purple, and amber of the sky, almost like a classic tritone print. One shows a single balloon in the center rising above clouds into a dark sky, another has the gondola in the clouds as the balloon reflects ambient light sources, another balloon is small against a banded circle reminiscent of Jupiter, and the last shows a pair of balloons above a purple alien mountain range silhouette as they ride horizontally arcing clouds or bands of auroras. None of them has any text.
No text, but I think thematically it does a good job.
Apple’s alt text fed into Craiyon for the digital illustration
9 images showing vastly different view of hot air balloons. Each is seen as if through a wide angle lens common to video games. The skies and ground are almost all clearly Earth, in a graphics style that feels like a video game. One set of balloons is in space, another apparently in a blocky desert Minecraft world. Most of the images have vestiges of status bars and radar maps at the edges. None of them has any text.
I opted for Apple’s auto-description again, A screenshot of the video game with text and the image of a hot air balloon. Experience the mighty auroras of Jupiter. I figured a video game might provide more than just calling it a pattern.
Apple’s alt text fed into Midjourney for the digital illustration
4 images showing balloons that generally have the banding texture of Jupiter, each floating among clouds. One image is just clouds, another shows the arcs of an atmosphere as if seen from a low-orbit height, the third has auroras with what may be a city in the far distance, and the last has auroras against a dusk desert sky with an edifice at the horizon. None of them has any text.
I opted for Apple’s auto-description again, A screenshot of the video game with text and the image of a hot air balloon. Experience the mighty auroras of Jupiter. I figured a video game might provide more than just calling it a pattern. Note how some of the balloons look like Jupiter.

Enhanced Photo

A digitally enhanced photograph
an undulating, translucent star-forming region in the Carina Nebula is shown in this Webb image, hued in ambers and blues; foreground stars with diffraction spikes can be seen, as can a speckling of background points of light through the cloudy nebula
One of the first images revealed to the public from NASA’s James Webb Space Telescope. Image credit: NASA, ESA, CSA, and STScI.

This is the alternative text for the image as written by NASA:

an undulating, translucent star-forming region in the Carina Nebula is shown in this Webb image, hued in ambers and blues; foreground stars with diffraction spikes can be seen, as can a speckling of background points of light through the cloudy nebula

I used these tools to automatically generate the following alternative text:

Microsoft Office
A galaxy with stars
Microsoft Edge
Appears to be a view of the earth from space
Google Chrome
Appears to be: nebula, also known as nebula, (when I arrowed down, then I heard) Nebula: James Webb Space Telescope, NASA
Apple iOS VoiceOver Recognition
An illustration of stars in the sky.

I chose the most descriptive auto-generated alternative text and fed it into two image generation tools along with my original alternative text. You can compare and contrast the differences:

My alt text fed into Craiyon for the enhanced photo
9 images showing variations on supernova remnant nebulae, all seen from a great distance to capture entire nebula. They have chaotic whorls of gas and dust, generally following a red and amber color palette, with sections in blue. All are against rich star fields.
I think those are supernova remnant nebulae.
My alt text fed into Midjourney for the enhanced photo
4 images, each showing long fingers of gray dust clouds highlighted with teal, each against dark skies. Two of them show stars that appear to be painted checks. One set of clouds appears like two long arms cutting diagonally across the frame, while another looks almost like a rend in the fabric of space.
I feel like these were painted.
Google’s alt text fed into Craiyon for the enhanced photo
9 images showing variations on multi-lobed planetary nebulae, all seen from a great distance to capture entire nebula. They have some sloppy symmetry on one or two axes and generally follow a red, amber, and blue color palette, with some yellow highlights. Occluding dust clouds are absent from most. All are against rich star fields.
Google provided the most context with nebula, also known as nebula, Nebula: James Webb Space Telescope, NASA. Which probably again explains why this looks like space, even if not the right space. I think those are planetary nebulae.
Google’s alt text fed into Midjourney for the enhanced photo
4 images of a single massive roiling cloud of darkness. In one, the cloud is black with dark purple edges and a yellow heart hidden within. In another, the entire sky is exploding in yellow with a compact black cloud in the center, leaking purple and teal fringes. Another appears to be a center of yellow light spewing black smoke in streamers to the four cardinal points. The final shows a cloud billowing out from a central yellow light, purple ripples like lightning, all flowing into a sea of teal and yellow clouds.
Google provided the most context with nebula, also known as nebula, Nebula: James Webb Space Telescope, NASA. I have no idea why it went with purple and yellow.

Painting

A painting
A night sky swirling with vivid blue spirals, a dazzling golden crescent moon, and constellations depicted as radiating spheres dominate the oil-on-canvas artwork. One or two flame-like cypress trees loom over the scene to the side, their black limbs curving and undulating to the motion of the partly obscured sky. A structured settlement lies in the distance in the bottom right of the canvas, among all of this activity. The modest houses and the thin spire of a church, which stands as a beacon against undulating blue hills, are made out of straight, controlled lines.
The Starry Night by Vincent van Gogh, 1889. The alternative text is cobbled together from an image description at Famous Paintings – Taking a Look at the World’s Most Popular Paintings. I figured it was worth letting professionals describe it.

This is the alternative text for the image (used as plain text description in source, and artist provided in original context):

A night sky swirling with vivid blue spirals, a dazzling golden crescent moon, and constellations depicted as radiating spheres dominate the oil-on-canvas artwork. One or two flame-like cypress trees loom over the scene to the side, their black limbs curving and undulating to the motion of the partly obscured sky. A structured settlement lies in the distance in the bottom right of the canvas, among all of this activity. The modest houses and the thin spire of a church, which stands as a beacon against undulating blue hills, are made out of straight, controlled lines.

I used these tools to automatically generate the following alternative text:

Microsoft Office
A painting of a city
Microsoft Edge
Appears to be a tree with a city in the background.
Google Chrome
No description available
Apple iOS VoiceOver Recognition
A photo of illustrations and a painting.

I chose the most descriptive auto-generated alternative text and fed it into two image generation tools along with my original alternative text. As a bonus for you, dear reader, I ran each twice, giving the context of the artist on the second pass. You can compare and contrast the differences:

My alt text fed into Craiyon for the painting
9 images showing mostly vibrant dark blue curves and clouds on black backgrounds, punctuated with yellow highlights with 5 images showing yellow crescent moons.
I think Craiyon stops paying attention to the hint after a while.
9 images in van Gogh's painterly style of wide, textured brush strokes defining sweeping curls of blue and white painted skies, wide bars of dark colors making up the ground and distant village, and each one showing a prominent tree of curving strokes. Each has large stars and a moon and a tall, narrow steeple in the far background.
I re-ran this and helped the AI by prefixing A Vincent van Gogh painting of…
Edge’s alt text fed into Craiyon for the painting
9 photos of a young tree, 5 of them devoid of leaves, all with a city skyline in the distant background.
Edge’s description came closest with a tree with a city in the background. Nowhere does it mention it is a painting, however. It makes sense that Craiyon would choose to render it as a photo.
9 paintings of a tree, mostly devoid of leaves, all with characteristic strong and abbreviated brush strokes. Each tree is alone in a field of mostly wheat color, with some near a fence or with a village in the background.
I gave the AI a boost and provided the kind of context a reader would have had, using a painting of a tree with a city in the background, in the style of Vincent van Gogh.
My alt text fed into Midjourney for the painting
4 images of large painted whorls of blue and raw siena against a midnight blue sky. In two of the images the raw siena whorls resolve into a yellow crescent moon, in another the full moon sits apart and above the orange.
Midjourney opted to render this as a painting.
4 images of textured brush strokes resolving into blues and oranges against a dark blue sky. Two of them represent the moon and stars as yellow disks, another shows a white crescent moon. The last one shows a field with a handful of spiky pine trees.
I gave the AI a second try and provided the kind of context a reader would have had, using a painting of a tree with a city in the background, in the style of Vincent van Gogh.
Edge’s alt text fed into Midjourney for the painting
4 hazy paintings of a tree with a prominent green crown with a city of spires rising in the background. In two images, the tree is coming from a spire, appearing to be both in the foreground and background.
Edge’s description came closest with a tree with a city in the background. Nowhere does it mention it is a painting, however.
4 paintings of a green and yellow crowned tree in the foreground, skylines behind them. One tree rests between two distant skylines, playing with perspective, seemingly done in water color. Another shows the tree rising from a spire as if the spire becomes smoke before reforming. Another shows the tree alone on a yellowing mound, horizontal brush strokes calling out the lumpy clouds.
I gave the AI a second try and provided the kind of context a reader would have had, using a painting of a tree with a city in the background, in the style of Vincent van Gogh.

Takeaway

I hear the refrain all the time that AI will solve digital accessibility. I see overlay vendors claim (lie) that their own (non-)AI solution can describe images for users. I talk to devs who assert that browsers can now write the alt text. Social media managers who rely on Facebook to stuff “looks like sandwich” into every photo of a concert flyer. Content writers who press a button and let Microsoft Word report everything as a phone. And so on.

Sighted users have lots of prior visual experience. A history of seeing things and conjuring up mental images that can fill the gaps when graphics fail to load and only the image alt text shows. Blind and low vision users have some of this prior visual experience too.

It turns out AI image generators also have lots of experience. They are fed visuals and, as a result, can make somewhat accurate (if accidental) guesses about what an anemic description might represent. We can see it above from Craiyon with the art gallery.

But look at how far they diverge from the source image when even the best auto-generated description is their seed. Now imagine the worst auto-generated description is in use. Now imagine a user who does not have that deep mental database of imagery. Now imagine that image is integral to understanding the concept of the page or task. Something necessary for a job, or booking a doctor visit, or knowing where their kid is going.

Think of that dopey buck-toothed heart with the glasses that should have been kawaii toast. When we rely on a self-declared AI tool to generate our image descriptions for us, everything might as well be a dopey heart for all the good it does.

Wrap-up

The title of this post is the seed phrase for the opening image.

If you want to see what your favorite alt text generation tool might come up with using the same images I used, I made a Codepen. It might be easier to use the debug mode.

You can automatically generate alternative text from many tools, including:

Don’t.


Thanks to Eric Bailey for getting the VoiceOver image recognition text for me, since my crusty iPad Mini 4th generation is too old to run that feature.

Update: 9 January 2023

Some folks have decided AI-generated image prompts (for feeding into AI image generators) are somehow a better way to provide alternative text. Using the format of this post, I show maybe not: AI-Generated Images from AI-Generated Prompts

Update: 13 February 2023

Microsoft is planning to use OpenAI’s ChatGPT to generate abstracts for Bing search results (which prompted Google to push out a panicked response and hemorrhage $100 billion in a day). It is not clear how else Bing will use ChatGPT, but for now the Edge Dev release lets you see how the abstracts might work. I grabbed Edge Dev and checked how the OpenAI-generated abstract sounded for this post.

Edge Dev browser with the page open and a Bing preview alongside it in a sidebar.

The abstract in the Bing sidebar:

The document is a critique of the use of AI tools to generate alternative text for images, and shows how they can produce inaccurate or misleading results. The document compares the original alternative text written by the author with the autogenerated alternative text from various sources, and then feeds them into two AI tools to generate images from text prompts.

3 Comments

Reply

[…] to AI-Generated Images from AI-Generated Alt Text by Adrian […]

Reply

Hi I would love to use some of these images as comparisons in our training on alt text. Would that be OK? Thanks

Hannah Thomas; . Permalink
In response to Hannah Thomas. Reply

Hannah, based on your email address (and for those reading along who might assume this is a blanket statement), I have no issues with you using it in your training — with the following notes:

  1. the source images retain their licensing, and assuming I understand Craiyon and Midjourney terms, the generated images are all CC BY-NC 4.0;
  2. credit and a link back to this post are appreciated.

Leave a Comment or Response

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>