AI-Generated Images from AI-Generated Prompts
As the world’s leading expert on a people-first approach to computer vision, I am dedicated to providing insights that enable designers, developers, and copywriters to create accessible images at the highest possible velocity. A velocity so high, in fact, you can almost hear the point whistling over their head, like this self-aggrandizing intro.
Building on my post AI-Generated Images from AI-Generated Alt Text, I am going to demonstrate the saddest, newest trend in generating alternative text for images — AI tools that use existing images to generate text prompts for AI-powered image generators.
Using the same source images from my previous post, I fed each into the CLIP Interrogator using ViT-L-14/openai (meant for Stable Diffusion), then took the text output and fed each into Stable Diffusion. I also fed my original alternative text into the same instance of Stable Diffusion to compare the results.
As with my previous post, dear sighted reader, I want you to read this without looking at the images. Each has been hidden in a disclosure. Instead, read the alternative text I provide then read the AI-generated prompt text, and try to visualize each. Consider how they differ. Then compare the images. As you visually compare each, think about how a screen reader user might have benefited from the hand-crafted artisanal alt text.
Flat Color Illustration
A flat color illustration

This is the alternative text for the image:
A cartoonish Kawaii slice of toast with happy eyes, open smiling mouth, and reddish cheeks; there is a spatter of blood coming from the top of the toast similar to the Watchmen logo.
CLIP Interrogator using ViT-L-14/openai generated this text prompt:
a slice of bread with a bloody face on it, a stock photo, inspired by Kanbun Master, twitch emote, kawaii cutest sticker ever, artoast8p, aliased, assassin, bun, spraying blood, choi, bengus, exploitable image, 1 5 0 4, 1 4 5 0, winking, scobillyflup
The Stable Diffusion output for each:
My alt text fed into Stable Diffusion for the flat color illustration

CLIP Interrogator prompt fed into Stable Diffusion for the flat color illustration

Photo
A photograph

This is the alternative text for the image (venue location provided in original surrounding context):
Looking across the atrium with four levels of the gallery space / ramp visible. The ramp contains exhibits as well as crowds of people moving among them, with some people leaning over the edge of the ramp wall.
CLIP Interrogator using ViT-L-14/openai generated this text prompt:
a group of people standing inside of a building, by Alexander Calder, trending on unsplash, gutai group, panoramic anamorphic, frank lloyd wright, stone stairway, new york times, man in white t – shirt, museum photo, imet2020, artgram
The Stable Diffusion output for each:
My alt text fed into Stable Diffusion for the photo

CLIP Interrogator prompt fed into Stable Diffusion for the photo

Digital Illustration
A digital illustration

This is the alternative text for the image:
A photo illustration travel poster showing a cluster of metallic hot air balloons with spheroid gondolas floating above the opaque clouds of Jupiter’s atmosphere. Behind and above the balloons is a sweeping aurora of teal and purple against a black starry sky. The advertisement reads “Experience the mighty auroras of Jupiter” in metallic block text at the bottom of the poster.
CLIP Interrogator using ViT-L-14/openai generated this text prompt:
an image of a hot air balloon flying in the sky, concept art, inspired by Barclay Shaw, cg society contest winner, space art, infused with aurora borealis, vibrant tourism poster, ryan dyar, floating molecules, official product image, poster tour, acnh, albion, 2 0 1 9, esa, listing image, images on the sales website, dark energy
The Stable Diffusion output for each:
My alt text fed into Stable Diffusion for the digital illustration

CLIP Interrogator prompt fed into Stable Diffusion for the digital illustration

Enhanced Photo
A digitally enhanced photograph

This is the alternative text for the image as written by NASA:
an undulating, translucent star-forming region in the Carina Nebula is shown in this Webb image, hued in ambers and blues; foreground stars with diffraction spikes can be seen, as can a speckling of background points of light through the cloudy nebula
CLIP Interrogator using ViT-L-14/openai generated this text prompt:
a star filled sky filled with lots of stars, inspired by Kim Keever, behance, space art, header, aspect ratio 1:3, dust
The Stable Diffusion output for each:
My alt text fed into Stable Diffusion for the enhanced photo

CLIP Interrogator prompt fed into Stable Diffusion for the enhanced photo

Painting
A painting

This is the alternative text for the image (used as plain text description in source, and artist provided in original context):
A night sky swirling with vivid blue spirals, a dazzling golden crescent moon, and constellations depicted as radiating spheres dominate the oil-on-canvas artwork. One or two flame-like cypress trees loom over the scene to the side, their black limbs curving and undulating to the motion of the partly obscured sky. A structured settlement lies in the distance in the bottom right of the canvas, among all of this activity. The modest houses and the thin spire of a church, which stands as a beacon against undulating blue hills, are made out of straight, controlled lines.
CLIP Interrogator using ViT-L-14/openai generated this text prompt:
a painting of a starry night, featured on pixiv, post-impressionism, in style of old painting, masterpiece album cover, matte painting of human mind, broad strokes, black and white painting, with blue light dark blue sky, paint smears, an ai generated image, starry sky in background, vincent
The Stable Diffusion output for each:
My alt text fed into Stable Diffusion for the painting

CLIP Interrogator prompt fed into Stable Diffusion for the painting

Takeaway
Some developers, having given up on (or been told to give up on) the Facebook-style underwhelming looks like…
tools that describe images, have looked for better tools to free them having to personally consider blind and bandwidth constrained users. Sadly, these automated prompt generators themselves rely on context many users may not have, such as naming artists or styles.
At the very least, these prompts would need to be edited for human consumption. The art sites, artists, uses, and strings of numbers are meaningless on their own. Simply removing them does nothing to make the prompt itself useful as alternative text.
In short, this is also a flawed approach and no organization should consider it a viable option.
Wrap-up
The title of this post is the seed phrase for the opening image.
If you want to see what your favorite AI prompt generation tool might come up with using the same images I used, I made a Codepen. It might be easier to use the debug mode.
You can automatically generate alternative text or prompts from many tools. Don’t.
2 Comments
Made it without ever looking at any images. Then yawned and rubbed my eye. The lid almost flipped backwards!
In response to .Well, we certainly do not want eyelid damage.
Leave a Comment or Response