Almost everyone can reminisce back to a time where they had their eyes affixed to one of these:
… a “Where’s Waldo” puzzle, where one has to find the brightly clad “Waldo”. Waldo wears a striped red shirt, and is primarily clad in red clothing. When your eye scans across this image, your brain is looking at regions of the image and looking for the distinguishing red-striped pattern of Waldo.
But let’s say you had to find Waldo in less than 2 seconds. Obviously, such a situation is incredibly unlikely to exist, but for the sake of experimentation we’ll consider it. The defining characteristics of Waldo could quite easily be encapsulated in an image processing function- one that looks for stripey things, and one that looks for red things. Defining an automated parser that could look for these characteristics would allow us to highlight regions of the image where Waldo is more likely to be.
Why would one do this? Well, with great computational abilities comes great ideas.
The characteristics of a Waldo image
First, we need to think about what characteristics define Waldo and a Waldo image, in order to make an image parser that can look for the characteristics of Waldo while excluding characteristics of the image as a whole. The most prevalent characteristics are
- Bright red clothing – so our image parser must look for regions that have a higher red content, as these would be the most likely regions to contain Waldo.
- Striped shirt – so our image parser must look for regions with higher morphological components (e.g. more complex regions that contain many distinct-looking objects), as these regions would be more likely to contain Waldo.
- Small, nonuniform appearance - Waldo is a small occurrence on the image, and thus our image parser must look in small regions to check these characteristics.
Using these ideas, we can define an image parser that can scan through our Waldo image and give a pretty good idea on where Waldo is most likely located.
Defining the parser
For simplicity’s sake, I am again using Mathematica to define this parser. However, writing it using the JAI (Java Advance Imaging) API would allow us to take advantage of some lower-level functionality that could potentially allow us to single out Waldo altogether. Since I don’t really feel like writing more than a few lines of code, I’m using Mathematica (also the reason why I do a lot of things in Mathematica).
Using ImagePartition, we can divide the image into smaller sub-regions.
We’ll use the original image to calculate red content for the parser. To locate images with a higher occurrence of distinct objects (such as a man wearing a striped shirt), we apply a transformation to these regions.
The parser will go through each region and assign a numeric index to both the morphological contents and the color contents of the image, which is favored towards regions with more objects and more red.
Using numeric data such as this, we can define a function that threads the data together, and uses a simple blurring function to blur out regions that don’t match the criteria for a region that contains Waldo.This function took the following Where’s Waldo image:
And, using the image parser defined above, produced:
Is it easier to find Waldo now? (If you’re still stuck, Waldo is located near the top left corner, in one of the un-blurred sections. Only his shirt is visible.)
Update: After tweaking the function and the way the color components were being analyzed, I was able to get it to perform a much more accurate search for a red striped object. I learned that the morphological component function in Mathematica returns a list of each pixel’s connection to other morphological components, so using those numbers allowed me to make a much better estimation on what a series of red stripes is. With these modifications, I was able to make:
This time, it singled out Waldo (with one false positive in the bottom row, near the middle). This was made with some minor modifications to the original algorithm above. Reducing the number of false positives was surprisingly easy when I knew what I should be looking for and what I shouldn’t, and heuristic tweaking of constants is always helpful.
Justifications and downfalls of this approach
The upside to this approach is that it is pretty good at eliminating large regions of space where Waldo cannot be located at. However, it does not point out regions where Waldo can absolutely be, only regions where Waldo is likely to be. A much better approach would be to define a parser that looks for a striped red pattern, which is the primary distinguishing factor of Waldo as compared to the rest of the image. Also, if you compare the first and second images, there are a lot of interesting false positives that intuitively do not fall into regions with high morphological contents or regions of high red content. This indicates that the parser’s threshold values when blurring still need to be adjusted and optimized.
On the flip side, this approach did effectively cut down the amount of area our eye has to look, and most people who looked at the output image could find Waldo within about 5 seconds. Hopefully, an implementation of a more complicated approach will be coming soon.
Update: After adjusting the algorithm, I got it to work much better. Check out part 2.