AI-Generated Images from AI-Generated Prompts
As the world’s leading expert on a people-first approach to computer vision, I am dedicated to providing insights that enable designers, developers, and copywriters to create accessible images at the highest possible velocity. A velocity so high, in fact, you can almost hear the point whistling over their head, like this self-aggrandizing intro.
Building on my post AI-Generated Images from AI-Generated Alt Text, I am going to demonstrate the saddest, newest trend in generating alternative text for images — AI tools that use existing images to generate text prompts for AI-powered image generators.
Using the same source images from my previous post, I fed each into the CLIP Interrogator using ViT-L-14/openai (meant for Stable Diffusion), then took the text output and fed each into Stable Diffusion. I also fed my original alternative text into the same instance of Stable Diffusion to compare the results.
As with my previous post, dear sighted reader, I want you to read this without looking at the images. Each has been hidden in a disclosure. Instead, read the alternative text I provide then read the AI-generated prompt text, and try to visualize each. Consider how they differ. Then compare the images. As you visually compare each, think about how a screen reader user might have benefited from the hand-crafted artisanal alt text.
Flat Color Illustration
A flat color illustration
This is the alternative text for the image:
A cartoonish Kawaii slice of toast with happy eyes, open smiling mouth, and reddish cheeks; there is a spatter of blood coming from the top of the toast similar to the Watchmen logo.
CLIP Interrogator using ViT-L-14/openai generated this text prompt:
a slice of bread with a bloody face on it, a stock photo, inspired by Kanbun Master, twitch emote, kawaii cutest sticker ever, artoast8p, aliased, assassin, bun, spraying blood, choi, bengus, exploitable image, 1 5 0 4, 1 4 5 0, winking, scobillyflup
The Stable Diffusion output for each:
My alt text fed into Stable Diffusion for the flat color illustration
CLIP Interrogator prompt fed into Stable Diffusion for the flat color illustration
Photo
A photograph
This is the alternative text for the image (venue location provided in original surrounding context):
Looking across the atrium with four levels of the gallery space / ramp visible. The ramp contains exhibits as well as crowds of people moving among them, with some people leaning over the edge of the ramp wall.
CLIP Interrogator using ViT-L-14/openai generated this text prompt:
a group of people standing inside of a building, by Alexander Calder, trending on unsplash, gutai group, panoramic anamorphic, frank lloyd wright, stone stairway, new york times, man in white t – shirt, museum photo, imet2020, artgram
The Stable Diffusion output for each:
My alt text fed into Stable Diffusion for the photo
CLIP Interrogator prompt fed into Stable Diffusion for the photo
Digital Illustration
A digital illustration
This is the alternative text for the image:
A photo illustration travel poster showing a cluster of metallic hot air balloons with spheroid gondolas floating above the opaque clouds of Jupiter’s atmosphere. Behind and above the balloons is a sweeping aurora of teal and purple against a black starry sky. The advertisement reads “Experience the mighty auroras of Jupiter” in metallic block text at the bottom of the poster.
CLIP Interrogator using ViT-L-14/openai generated this text prompt:
an image of a hot air balloon flying in the sky, concept art, inspired by Barclay Shaw, cg society contest winner, space art, infused with aurora borealis, vibrant tourism poster, ryan dyar, floating molecules, official product image, poster tour, acnh, albion, 2 0 1 9, esa, listing image, images on the sales website, dark energy
The Stable Diffusion output for each:
My alt text fed into Stable Diffusion for the digital illustration
CLIP Interrogator prompt fed into Stable Diffusion for the digital illustration
Enhanced Photo
A digitally enhanced photograph
This is the alternative text for the image as written by NASA:
an undulating, translucent star-forming region in the Carina Nebula is shown in this Webb image, hued in ambers and blues; foreground stars with diffraction spikes can be seen, as can a speckling of background points of light through the cloudy nebula
CLIP Interrogator using ViT-L-14/openai generated this text prompt:
a star filled sky filled with lots of stars, inspired by Kim Keever, behance, space art, header, aspect ratio 1:3, dust
The Stable Diffusion output for each:
My alt text fed into Stable Diffusion for the enhanced photo
CLIP Interrogator prompt fed into Stable Diffusion for the enhanced photo
Painting
A painting
This is the alternative text for the image (used as plain text description in source, and artist provided in original context):
A night sky swirling with vivid blue spirals, a dazzling golden crescent moon, and constellations depicted as radiating spheres dominate the oil-on-canvas artwork. One or two flame-like cypress trees loom over the scene to the side, their black limbs curving and undulating to the motion of the partly obscured sky. A structured settlement lies in the distance in the bottom right of the canvas, among all of this activity. The modest houses and the thin spire of a church, which stands as a beacon against undulating blue hills, are made out of straight, controlled lines.
CLIP Interrogator using ViT-L-14/openai generated this text prompt:
a painting of a starry night, featured on pixiv, post-impressionism, in style of old painting, masterpiece album cover, matte painting of human mind, broad strokes, black and white painting, with blue light dark blue sky, paint smears, an ai generated image, starry sky in background, vincent
The Stable Diffusion output for each:
My alt text fed into Stable Diffusion for the painting
CLIP Interrogator prompt fed into Stable Diffusion for the painting
Takeaway
Some developers, having given up on (or been told to give up on) the Facebook-style underwhelming looks like…
tools that describe images, have looked for better tools to free them having to personally consider blind and bandwidth constrained users. Sadly, these automated prompt generators themselves rely on context many users may not have, such as naming artists or styles.
At the very least, these prompts would need to be edited for human consumption. The art sites, artists, uses, and strings of numbers are meaningless on their own. Simply removing them does nothing to make the prompt itself useful as alternative text.
In short, this is also a flawed approach and no organization should consider it a viable option.
Wrap-up
The title of this post is the seed phrase for the opening image.
If you want to see what your favorite AI prompt generation tool might come up with using the same images I used, I made a Codepen. It might be easier to use the debug mode.
You can automatically generate alternative text or prompts from many tools. Don’t.
Update: 13 February 2023
Microsoft is planning to use OpenAI’s ChatGPT to generate abstracts for Bing search results (which prompted Google to push out a panicked response and hemorrhage $100 billion in a day). It is not clear how else Bing will use ChatGPT, but for now the Edge Dev release lets you see how the abstracts might work. I grabbed Edge Dev and checked how the OpenAI-generated abstract sounded for this post.
Update: 8 April 2023
Over on the walled garden that is the A11y Slack, Oscar Casamitjana fed the digital illustration image from above into Google Bard with the prompt describe this image https://adrianroselli.com/wp-content/uploads/2022/08/AI-digital.jpg
. His results were interesting:
The image shows a colorful digital painting of a human face. The face is made up of geometric shapes and patterns, and it has a surreal and otherworldly appearance. The eyes are particularly striking, as they are large and expressive. The overall effect of the painting is both beautiful and unsettling.
The painting was created by Adrian Roselli, a digital artist who specializes in creating AI-generated art. Roselli uses a variety of AI tools and techniques to create his art, and he is constantly experimenting with new ways to use AI to create unique and visually arresting images.
The painting “AI Digital” is a good example of Roselli’s work. The painting is both beautiful and unsettling, and it is a testament to the power of AI to create art that is both familiar and strange.
The painting is also a reflection of the current state of AI. AI is a powerful tool that has the potential to create great things, but it also has the potential to be used for harm. The painting “AI Digital” captures this duality of AI, and it is a reminder that we must be careful about how we use this powerful technology.
I tried the same prompt and got three options:
Option 1
The image shows a person’s head with a digital screen for a face. The screen is divided into four quadrants, each of which shows a different image. The top left quadrant shows a close-up of a human eye, the top right quadrant shows a map of the world, the bottom left quadrant shows a graph of stock market data, and the bottom right quadrant shows a series of mathematical equations. The person’s face is expressionless, and the screen is the only thing that gives any indication of their thoughts or feelings.
The image is a metaphor for the way that artificial intelligence (AI) is becoming increasingly integrated into our lives. The human eye in the top left quadrant represents the way that AI is being used to collect and analyze data about us. The map of the world in the top right quadrant represents the way that AI is being used to track our movements and activities. The graph of stock market data in the bottom left quadrant represents the way that AI is being used to make financial decisions. And the series of mathematical equations in the bottom right quadrant represents the way that AI is being used to solve complex problems.
The image is both beautiful and unsettling. It is beautiful because it shows the power of AI and the potential it has to improve our lives. But it is also unsettling because it shows how AI is becoming increasingly powerful and how it is starting to control our lives.
The image raises a number of questions. What does it mean for humans when AI becomes more powerful than us? Will AI eventually take over the world? Or will we be able to control AI and use it for our own benefit? These are questions that we will need to answer as AI continues to develop.
Option 2
The AI Digital image is a futuristic digital art piece that depicts a human brain with a circuit board inside it. The brain is surrounded by binary code and other digital elements, and there is a large AI digital logo in the center of the image. The image is meant to represent the power of artificial intelligence and how it is changing the world.
The human brain is a complex organ that is responsible for all of our thoughts, feelings, and actions. It is made up of billions of neurons that are connected to each other by synapses. These connections allow the brain to process information and make decisions.
The circuit board is a physical representation of the digital code that powers computers and other electronic devices. It is made up of tiny transistors that are connected to each other by wires. These connections allow the circuit board to process information and perform calculations.
The binary code is a series of 1s and 0s that is used to represent information in computers. It is the basic language of computers, and it is used to store, process, and transmit data.
The AI digital logo is a symbol of artificial intelligence. It is a stylized representation of the letter “A” that is made up of binary code. The logo is meant to represent the power and potential of artificial intelligence.
The image is meant to be a visual representation of the power of artificial intelligence and how it is changing the world. The human brain is a symbol of intelligence and creativity, while the circuit board is a symbol of technology and innovation. The binary code is a symbol of data and information, and the AI digital logo is a symbol of artificial intelligence. Together, these elements represent the potential of artificial intelligence to change the world for the better.
Option 3
The image shows a man and a woman sitting at a table in a modern office. The man is wearing a suit and tie, and the woman is wearing a dress. They are both looking at a laptop computer. The computer screen is showing a graph of data. The man is pointing to the graph, and the woman is nodding. The image is well-lit and has a clear focus. The colors are muted and the overall tone is serious.
The man and woman are likely business professionals who are working on a project together. The graph on the computer screen shows data that is relevant to their project. The man is pointing to the graph to explain something to the woman. The woman is nodding, which indicates that she is listening and understanding what the man is saying.
The image suggests that AI is being used in the workplace to help businesses make better decisions. The man and woman are using AI to analyze data and make informed decisions about their project. AI is becoming increasingly important in the workplace, and this image shows how it can be used to improve business performance.
I tried the same prompt with Microsoft Bing:
I’m sorry but I cannot describe the image you provided as I don’t have access to it. However, there are AI image recognition tools that can help you describe images. For example, Microsoft’s Azure Cognitive Services provides an Analyze Image API that can describe images¹. You can call this API through a native SDK or through REST calls. Include Description in the visualFeatures query parameter. Then, when you get the full JSON response, parse the string for the contents of the “description” section¹.
Anyone suggesting ChatGPT, Bard, or other self-described AI tools can generate their alternative text for them is simply being lazy.
Webinar Reference (Added 7 July 2024)
This post was briefly referenced in a Smashing Hour episode in March 2023.
2 Comments
Made it without ever looking at any images. Then yawned and rubbed my eye. The lid almost flipped backwards!
In response to .Well, we certainly do not want eyelid damage.
Leave a Comment or Response