AI-Generated Images from AI-Generated Prompts

As the world’s leading expert on a people-first approach to computer vision, I am dedicated to providing insights that enable designers, developers, and copywriters to create accessible images at the highest possible velocity. A velocity so high, in fact, you can almost hear the point whistling over their head, like this self-aggrandizing intro.

Building on my post AI-Generated Images from AI-Generated Alt Text, I am going to demonstrate the saddest, newest trend in generating alternative text for images — AI tools that use existing images to generate text prompts for AI-powered image generators.

Using the same source images from my previous post, I fed each into the CLIP Interrogator using ViT-L-14/openai (meant for Stable Diffusion), then took the text output and fed each into Stable Diffusion. I also fed my original alternative text into the same instance of Stable Diffusion to compare the results.

As with my previous post, dear sighted reader, I want you to read this without looking at the images. Each has been hidden in a disclosure. Instead, read the alternative text I provide then read the AI-generated prompt text, and try to visualize each. Consider how they differ. Then compare the images. As you visually compare each, think about how a screen reader user might have benefited from the hand-crafted artisanal alt text.

Flat Color Illustration

A flat color illustration
A cartoonish Kawaii slice of toast with happy eyes, open smiling mouth, and reddish cheeks; there is a spatter of blood coming from the top of the toast similar to the Watchmen logo. A cartoonish Kawaii slice of toast with happy eyes, open smiling mouth, and reddish cheeks; there is a spatter of blood coming from the top of the toast similar to the Watchmen logo.
The first is a PNG image, the second is an SVG.

This is the alternative text for the image:

A cartoonish Kawaii slice of toast with happy eyes, open smiling mouth, and reddish cheeks; there is a spatter of blood coming from the top of the toast similar to the Watchmen logo.

CLIP Interrogator using ViT-L-14/openai generated this text prompt:

a slice of bread with a bloody face on it, a stock photo, inspired by Kanbun Master, twitch emote, kawaii cutest sticker ever, artoast8p, aliased, assassin, bun, spraying blood, choi, bengus, exploitable image, 1 5 0 4, 1 4 5 0, winking, scobillyflup

The Stable Diffusion output for each:

My alt text fed into Stable Diffusion for the flat color illustration
Four images. One is a cartoony slice of bread with a happy face, with an angry red face on its lower half, alongside photo-realistic bread. Another looks like a cyclops meatball wearing a ski goggle, but with a smile. The third is a maybe a cartoon slice of bread with three eyes and rectangle mouth, alongside an upside-down speckled frowning meatball. The last is a stylized black shape that may have horns or pointy ears, with a white angry face, floating above a wide-eyed slice of rye with a trail of jelly across its eyes and mouth.
I want to know why so many meatballs.
CLIP Interrogator prompt fed into Stable Diffusion for the flat color illustration
A flat-color slice of bread with two photo-realistic bits stuck on it as ears; it has red heart eyes, black nostrils, and a rectangular mouth with red circuitry lights. A photo-realistic slice of maybe Italian bread with two large pitch black circles as eyes. A cartoon maybe slice of bread or maybe round loaf, with what appears to be a horizontals gash through its middle, which is also its face, with a large blob of blood spilled out at the end, frown and eyes following it. A cartoon maybe gelato bowl, with a cookie sticking up out of the top and two expressive angry eyes behind a few drips of blue moisture, probably sweat.
These are certainly more unsettling, at least until the bowl of gelato.

Photo

A photograph
Looking across the atrium with four levels of the gallery space / ramp visible. The ramp contains exhibits as well as crowds of people moving among them, with some people leaning over the edge of the ramp wall.
I used the image in a tweet about the Guggenheim, so the venue context was there. Wallygva at English Wikipedia, CC BY-SA 3.0.

This is the alternative text for the image (venue location provided in original surrounding context):

Looking across the atrium with four levels of the gallery space / ramp visible. The ramp contains exhibits as well as crowds of people moving among them, with some people leaning over the edge of the ramp wall.

CLIP Interrogator using ViT-L-14/openai generated this text prompt:

a group of people standing inside of a building, by Alexander Calder, trending on unsplash, gutai group, panoramic anamorphic, frank lloyd wright, stone stairway, new york times, man in white t – shirt, museum photo, imet2020, artgram

The Stable Diffusion output for each:

My alt text fed into Stable Diffusion for the photo
Four separate images broadly showing crowds of people moving through large multi-storied open spaces with glass walls and ceiling. One image is looking down into an atrium, and feels most like the source image. Another shows a very symmetric and squared set of floors. None show ramps.
These at least convey the sweeping expanse of the space, if not its main attributes.
CLIP Interrogator prompt fed into Stable Diffusion for the photo
One image of a small crowd in a large plain white room with a glass wall and polished concrete floor. Another of a black room and light tile floor with people clustered just past an orange wall and doorway that take up much of the room. A sweeping raw concrete conical room that tapers slightly toward the top to a flat glass ceiling, a couple dozen people milling around the edge of the blank walls. Four people looking a blank white wall that has a few circles and swoops pressed into its surface.
These rooms feel too much like a human abbattoir. Except maybe the one with the odd climbing wall.

Digital Illustration

A digital illustration
A photo illustration travel poster showing a cluster of metallic hot air balloons with spheroid gondolas floating above the opaque clouds of Jupiter’s atmosphere. Behind and above the balloons is a sweeping aurora of teal and purple against a black starry sky. The advertisement reads “Experience the mighty auroras of Jupiter” in metallic block text at the bottom of the poster.
From NASA’s Visions of the Future poster series. Courtesy NASA/JPL-Caltech.

This is the alternative text for the image:

A photo illustration travel poster showing a cluster of metallic hot air balloons with spheroid gondolas floating above the opaque clouds of Jupiter’s atmosphere. Behind and above the balloons is a sweeping aurora of teal and purple against a black starry sky. The advertisement reads “Experience the mighty auroras of Jupiter” in metallic block text at the bottom of the poster.

CLIP Interrogator using ViT-L-14/openai generated this text prompt:

an image of a hot air balloon flying in the sky, concept art, inspired by Barclay Shaw, cg society contest winner, space art, infused with aurora borealis, vibrant tourism poster, ryan dyar, floating molecules, official product image, poster tour, acnh, albion, 2 0 1 9, esa, listing image, images on the sales website, dark energy

The Stable Diffusion output for each:

My alt text fed into Stable Diffusion for the digital illustration
Four images. One is three slightly textured balloons against a teal sky that opens into black with a mis-shapen planetoid above it; at the bottom of the frame is nonsense text in a black serif typeface. The second has four balloons, all dark except a green one half-covered in a green seemingly metallic sheath, all against purple sky with white, teal, and read aurora. Third has one prominent chunky purple balloon and a few in the background, against a white-streaked teal sky and floating over a flat rocky ground. The final are cartoony flat-color yellow or pink balloons against a start field that goes from dark purple to dark pink, with a yellow and teal spiral galaxy in the distance.
A token effort at bringing in the text.
CLIP Interrogator prompt fed into Stable Diffusion for the digital illustration
An aurora made up of a ghost image of a balloon being blown nearly horizontal, with one distant balloon and the crescent edge of one very near that may also be a pumpkin. Four colorful balloons in the purple stratosphere well above the highest clouds of blue and red and yellow arcs, with a green aurora above them. A nearby perfectly normal photo-realistic balloon against a perfectly ordinary dark sky. A lone balloon above a mountain horizon with yellow, teal, green, blue, and purple aurora.
No effort at text, because the prompt did not include the text.

Enhanced Photo

A digitally enhanced photograph
an undulating, translucent star-forming region in the Carina Nebula is shown in this Webb image, hued in ambers and blues; foreground stars with diffraction spikes can be seen, as can a speckling of background points of light through the cloudy nebula
One of the first images revealed to the public from NASA’s James Webb Space Telescope. Image credit: NASA, ESA, CSA, and STScI.

This is the alternative text for the image as written by NASA:

an undulating, translucent star-forming region in the Carina Nebula is shown in this Webb image, hued in ambers and blues; foreground stars with diffraction spikes can be seen, as can a speckling of background points of light through the cloudy nebula

CLIP Interrogator using ViT-L-14/openai generated this text prompt:

a star filled sky filled with lots of stars, inspired by Kim Keever, behance, space art, header, aspect ratio 1:3, dust

The Stable Diffusion output for each:

My alt text fed into Stable Diffusion for the enhanced photo
Four versions of colorful clouds of gas in a star-laden field, each showing hard transitions between clouds that mimic what we see in most nebulae images, if not the original image, along with the four-pointed diffraction spikes.
Any of these could be a NASA image.
CLIP Interrogator prompt fed into Stable Diffusion for the enhanced photo
One image is a dense yellow cloud with diffuse edges and purple starts against a black star field. Another is a cloud of lavender dust radiating out in fuzzy lines from the center, against black with rough colored dots. A loose cloud of white and purple taking up most of the frame, not much structure, only a little bit of black visible at the edge. A handful of narrow spiked glowing triangles of color on a field of black, one large white dust cloud in the corner, and edges of the image feeling like looking through torn parchment.
These feel like images pulled from India’s festival of color (Holi).

Painting

A painting
A night sky swirling with vivid blue spirals, a dazzling golden crescent moon, and constellations depicted as radiating spheres dominate the oil-on-canvas artwork. One or two flame-like cypress trees loom over the scene to the side, their black limbs curving and undulating to the motion of the partly obscured sky. A structured settlement lies in the distance in the bottom right of the canvas, among all of this activity. The modest houses and the thin spire of a church, which stands as a beacon against undulating blue hills, are made out of straight, controlled lines.
The Starry Night by Vincent van Gogh, 1889. The alternative text is cobbled together from an image description at Famous Paintings – Taking a Look at the World’s Most Popular Paintings. I figured it was worth letting professionals describe it.

This is the alternative text for the image (used as plain text description in source, and artist provided in original context):

A night sky swirling with vivid blue spirals, a dazzling golden crescent moon, and constellations depicted as radiating spheres dominate the oil-on-canvas artwork. One or two flame-like cypress trees loom over the scene to the side, their black limbs curving and undulating to the motion of the partly obscured sky. A structured settlement lies in the distance in the bottom right of the canvas, among all of this activity. The modest houses and the thin spire of a church, which stands as a beacon against undulating blue hills, are made out of straight, controlled lines.

CLIP Interrogator using ViT-L-14/openai generated this text prompt:

a painting of a starry night, featured on pixiv, post-impressionism, in style of old painting, masterpiece album cover, matte painting of human mind, broad strokes, black and white painting, with blue light dark blue sky, paint smears, an ai generated image, starry sky in background, vincent

The Stable Diffusion output for each:

My alt text fed into Stable Diffusion for the painting
Four images again. A lone tree on a rich orange hill outlined in a light blue glow against a dark blue sky filled with points of light and long horizontal orange streamers. A dark blue sky with wide arcs of almost chalk yellow and darker blue lines, with a glowing mound of yellow ring in purple on the ground, itself surrounded by what may be hedgerows. A painterly yellow sun, surrounded by yellow and blue flecks, circled by a light blue border, looking down from a dark sky over a few ribbons of gold and orange coming up from a rounded blue and light blue horizon as if from a great height. A pair of giant yellow-green  ampersands standing in a darkened field as an a classical illustration of planetary orbits around the sun fills the sky above.
These all have a vibe, though I would never confuse them for being a Van Gogh.
CLIP Interrogator prompt fed into Stable Diffusion for the painting
Four sloppy variations of Van Gogh’s “Starry Night”, the differences primarily being framing (though one has it in a wooden frame), zoom level, darkness, and blockiness of strokes.
This was clearly less a prompt than instructions to look up the image in its data store.

Takeaway

Some developers, having given up on (or been told to give up on) the Facebook-style underwhelming looks like… tools that describe images, have looked for better tools to free them having to personally consider blind and bandwidth constrained users. Sadly, these automated prompt generators themselves rely on context many users may not have, such as naming artists or styles.

At the very least, these prompts would need to be edited for human consumption. The art sites, artists, uses, and strings of numbers are meaningless on their own. Simply removing them does nothing to make the prompt itself useful as alternative text.

In short, this is also a flawed approach and no organization should consider it a viable option.

Wrap-up

24 assorted tiny thumbnails, mostly from nature except for one half face of a woman. The title of this post is the seed phrase for the opening image.

If you want to see what your favorite AI prompt generation tool might come up with using the same images I used, I made a Codepen. It might be easier to use the debug mode.

You can automatically generate alternative text or prompts from many tools. Don’t.

Update: 13 February 2023

Microsoft is planning to use OpenAI’s ChatGPT to generate abstracts for Bing search results (which prompted Google to push out a panicked response and hemorrhage $100 billion in a day). It is not clear how else Bing will use ChatGPT, but for now the Edge Dev release lets you see how the abstracts might work. I grabbed Edge Dev and checked how the OpenAI-generated abstract sounded for this post.

Edge Dev browser with the page open and a Bing preview alongside it in a sidebar.

The abstract in the Bing sidebar:

The author demonstrates how AI tools can generate images from text prompts that are derived from existing images, and compares the results with handwritten alternative text for accessibility. Using CLIP Interrogator and Stable Diffusion, the author shows how AIgenerated text prompts and images can differ from the original images and their alternative text, and how this can affect the visual experience of screen reader users.

Update: 8 April 2023

Over on the walled garden that is the A11y Slack, Oscar Casamitjana fed the digital illustration image from above into Google Bard with the prompt describe this image https://adrianroselli.com/wp-content/uploads/2022/08/AI-digital.jpg. His results were interesting:

The image shows a colorful digital painting of a human face. The face is made up of geometric shapes and patterns, and it has a surreal and otherworldly appearance. The eyes are particularly striking, as they are large and expressive. The overall effect of the painting is both beautiful and unsettling.

The painting was created by Adrian Roselli, a digital artist who specializes in creating AI-generated art. Roselli uses a variety of AI tools and techniques to create his art, and he is constantly experimenting with new ways to use AI to create unique and visually arresting images.

The painting “AI Digital” is a good example of Roselli’s work. The painting is both beautiful and unsettling, and it is a testament to the power of AI to create art that is both familiar and strange.

The painting is also a reflection of the current state of AI. AI is a powerful tool that has the potential to create great things, but it also has the potential to be used for harm. The painting “AI Digital” captures this duality of AI, and it is a reminder that we must be careful about how we use this powerful technology.

From Google Bard prompt describe this image https://adrianroselli.com/wp-content/uploads/2022/08/AI-digital.jpg

I tried the same prompt and got three options:

Option 1

The image shows a person’s head with a digital screen for a face. The screen is divided into four quadrants, each of which shows a different image. The top left quadrant shows a close-up of a human eye, the top right quadrant shows a map of the world, the bottom left quadrant shows a graph of stock market data, and the bottom right quadrant shows a series of mathematical equations. The person’s face is expressionless, and the screen is the only thing that gives any indication of their thoughts or feelings.

The image is a metaphor for the way that artificial intelligence (AI) is becoming increasingly integrated into our lives. The human eye in the top left quadrant represents the way that AI is being used to collect and analyze data about us. The map of the world in the top right quadrant represents the way that AI is being used to track our movements and activities. The graph of stock market data in the bottom left quadrant represents the way that AI is being used to make financial decisions. And the series of mathematical equations in the bottom right quadrant represents the way that AI is being used to solve complex problems.

The image is both beautiful and unsettling. It is beautiful because it shows the power of AI and the potential it has to improve our lives. But it is also unsettling because it shows how AI is becoming increasingly powerful and how it is starting to control our lives.

The image raises a number of questions. What does it mean for humans when AI becomes more powerful than us? Will AI eventually take over the world? Or will we be able to control AI and use it for our own benefit? These are questions that we will need to answer as AI continues to develop.

From Google Bard prompt describe this image https://adrianroselli.com/wp-content/uploads/2022/08/AI-digital.jpg
Option 2

The AI Digital image is a futuristic digital art piece that depicts a human brain with a circuit board inside it. The brain is surrounded by binary code and other digital elements, and there is a large AI digital logo in the center of the image. The image is meant to represent the power of artificial intelligence and how it is changing the world.

The human brain is a complex organ that is responsible for all of our thoughts, feelings, and actions. It is made up of billions of neurons that are connected to each other by synapses. These connections allow the brain to process information and make decisions.

The circuit board is a physical representation of the digital code that powers computers and other electronic devices. It is made up of tiny transistors that are connected to each other by wires. These connections allow the circuit board to process information and perform calculations.

The binary code is a series of 1s and 0s that is used to represent information in computers. It is the basic language of computers, and it is used to store, process, and transmit data.

The AI digital logo is a symbol of artificial intelligence. It is a stylized representation of the letter “A” that is made up of binary code. The logo is meant to represent the power and potential of artificial intelligence.

The image is meant to be a visual representation of the power of artificial intelligence and how it is changing the world. The human brain is a symbol of intelligence and creativity, while the circuit board is a symbol of technology and innovation. The binary code is a symbol of data and information, and the AI digital logo is a symbol of artificial intelligence. Together, these elements represent the potential of artificial intelligence to change the world for the better.

From Google Bard prompt describe this image https://adrianroselli.com/wp-content/uploads/2022/08/AI-digital.jpg
Option 3

The image shows a man and a woman sitting at a table in a modern office. The man is wearing a suit and tie, and the woman is wearing a dress. They are both looking at a laptop computer. The computer screen is showing a graph of data. The man is pointing to the graph, and the woman is nodding. The image is well-lit and has a clear focus. The colors are muted and the overall tone is serious.

The man and woman are likely business professionals who are working on a project together. The graph on the computer screen shows data that is relevant to their project. The man is pointing to the graph to explain something to the woman. The woman is nodding, which indicates that she is listening and understanding what the man is saying.

The image suggests that AI is being used in the workplace to help businesses make better decisions. The man and woman are using AI to analyze data and make informed decisions about their project. AI is becoming increasingly important in the workplace, and this image shows how it can be used to improve business performance.

From Google Bard prompt describe this image https://adrianroselli.com/wp-content/uploads/2022/08/AI-digital.jpg

I tried the same prompt with Microsoft Bing:

I’m sorry but I cannot describe the image you provided as I don’t have access to it. However, there are AI image recognition tools that can help you describe images. For example, Microsoft’s Azure Cognitive Services provides an Analyze Image API that can describe images¹. You can call this API through a native SDK or through REST calls. Include Description in the visualFeatures query parameter. Then, when you get the full JSON response, parse the string for the contents of the “description” section¹.

From Google Bard prompt describe this image https://adrianroselli.com/wp-content/uploads/2022/08/AI-digital.jpg

Anyone suggesting ChatGPT, Bard, or other self-described AI tools can generate their alternative text for them is simply being lazy.

Webinar Reference (Added 7 July 2024)

This post was briefly referenced in a Smashing Hour episode in March 2023.

YouTube: Smashing Hour with Sara Soueidan – March 2023.

2 Comments

Reply

Made it without ever looking at any images. Then yawned and rubbed my eye. The lid almost flipped backwards!

In response to phil sawatsky. Reply

Well, we certainly do not want eyelid damage.

Leave a Comment or Response

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>