For most of the history of the web, search engines were, in a very real sense, blind. They could not look at your photographs. When Google encountered an image, it did not see a sunset or a wrench or a plate of pasta — it saw a file, and it inferred the contents from the words wrapped around it. The filename. The alt text. The caption. The surrounding paragraph. We optimized images, in other words, by describing them in language, because language was the only thing the machine could read.
That world has ended. And the change is more profound than most site owners have yet absorbed.
In 2026, the AI vision models that increasingly power search — the Gemini and GPT-class systems behind AI Overviews, visual search, and the AI assistants people now lean on — actually look at the pixels. They do not infer your sunset from the word "sunset." They see the sunset. They break the image into a grid of small patches, convert those patches into a sequence of numerical vectors through a process called visual tokenization, and reason about what is actually depicted. The machine, after twenty-five years of blindness, has opened its eyes.
This single fact rewrites a great deal of what we thought we knew about image SEO. So let us think it through carefully, because the implications are not what most people expect.
Just how big has visual search become?
Before we talk strategy, let me give you a sense of the scale, because it is easy to dismiss images as a side concern. They are not. Google Lens now processes more than twenty billion visual searches every month. Google Images, on its own, delivers something on the order of 22% of all web traffic — nearly a quarter of everything. People are searching with their cameras, pointing their phones at objects, asking "what is this" and "where can I buy this" in numbers that would have seemed absurd a few years ago.
So when I say images matter for discovery now, I do not mean as a nice-to-have. For many businesses — anything visual, anything physical, anything people see before they buy — the image surface is becoming a primary channel, not a decorative one.
What alt text is for now — and what it is not
Here is where I have to correct a misconception that even experienced people still carry. With the machine now able to see, a reasonable person might conclude that alt text no longer matters. If the AI can read the pixels directly, why bother describing them in words?
It is a logical thought. It is also wrong, and the reason is subtle and important.
Alt text has not died. Its job has changed. In the old world, alt text was the machine's only source of truth about an image — it was the description. In the new world, the machine forms its own understanding from the pixels, and alt text becomes something different: confirmation. Grounding. The AI looks at your image, develops a hypothesis about what it shows, and then checks that hypothesis against your alt text. When the two agree, the model's confidence rises. It now trusts its reading of your image, and it is far more willing to surface and cite that image because it is sure of what the image is.
So alt text is no longer the description the machine reads instead of looking. It is the caption the machine reads to confirm what it saw. That is a meaningful shift in how you should write it. Vague, keyword-stuffed alt text — the "best plumber Chicago emergency service" nonsense people used to cram in — does not confirm anything, because it does not actually describe the image. Honest, specific, descriptive alt text that genuinely matches the picture does exactly the grounding job the model now needs. The accessibility-first instinct, describing the image truthfully for a person who cannot see it, turns out to be precisely the instinct that serves the AI as well. The good practice and the effective practice have converged.
The surprising new ranking signal: image quality
Now here is the part that genuinely surprised me when I first understood it, and it changes how you should think about every image you publish.
Because the AI is reading the actual pixels, the quality of those pixels is now a ranking and citation factor in its own right. Not the file size, not the load speed — the visual clarity.
Consider what happens when you take a perfectly good photograph and crush it through aggressive lossy compression to save a few kilobytes. To a human eye, it might look slightly soft, perhaps a little blocky, but recognizable. To a vision model, those compression artifacts are noise injected directly into the visual tokens. The grid of patches it reads becomes muddy and ambiguous. And an AI working from muddy visual tokens can misread the image entirely — it can hallucinate objects that are not there, or fail to recognize objects that are, simply because the "visual words" you handed it were slurred.
Think of it like trying to understand someone speaking through a bad phone connection. The words are technically there, but the static makes you guess, and guesses are sometimes wrong. A heavily degraded image is a bad phone connection to the machine's eyes.
This creates a genuine tension with an old habit. For years, the entire thrust of image optimization was: make it smaller, make it lighter, compress it harder, because page speed and Core Web Vitals rewarded the lightest possible files. That advice was correct, and it is still partly correct — a multi-megabyte hero image will still wreck your load time and your Core Web Vitals, and that still matters enormously. But the lesson of 2026 is that you can no longer optimize purely for the smallest file. You have to find the point where the image is light enough to load fast and clear enough for a machine to read confidently. Compress with respect, not with violence.
In practice this means modern formats used sensibly. WebP and AVIF give you dramatically smaller files than the old JPEG and PNG while preserving the visual clarity the vision models need — they let you have both the speed and the legibility, which is exactly the balance the new world demands. Serve images at sensible resolutions for their display context, never larger than needed but never so compressed that the detail dissolves. The goal is an image that is simultaneously fast for the browser and crisp for the machine.
Putting it together
So what does good image SEO actually look like now, when the machine can see?
It starts with images that are genuinely worth seeing — real, specific, relevant visuals rather than generic stock photography that the model has already seen ten thousand times and learned to discount. It means describing those images honestly in alt text, so the machine's reading of the pixels gets confirmed by your words and its confidence climbs. It means encoding them in modern formats at a quality that respects both the browser's need for speed and the model's need for clarity. And it means treating images as first-class content — sources of discovery and citation in their own right — rather than as decoration sprinkled around the "real" content of the page.
The deeper point, and the one I want to leave you with, is that the gap between optimizing for humans and optimizing for machines has narrowed almost to nothing on this front. The machine now wants what a person always wanted: a clear, honest, high-quality image that genuinely shows what it claims to show. The age of tricking a blind crawler with keyword-stuffed alt text is over. The machine can see now. The only way to win is to give it something genuinely worth looking at.
If you would like to know how your site's images are read by the AI systems now doing the looking — and where blurry compression or lazy alt text is quietly costing you visibility — that is exactly the kind of audit Licheo runs. Contact us, and we will show you what the machine sees when it looks at your pages.