Editor’s note: This 2012 5-in-5 article is by IBM’s John
Smith, senior manager, Intelligent Information Management.
They say a picture is worth a thousand words, but for
computers, they’re just thousands of pixels. But within the next five years,
IBM Research thinks that computers will not only be able to look at images, but
help us understand the 500 billion photos we’re taking every year (that’s about
78 photos for each person on the planet).
Getting a computer to
The human eye processes images by parsing colors and looking
at edge information and texture characteristics. In addition, we can understand
what an object is, the setting it’s in and what it may be doing. While a human
can learn this rather quickly, computers traditionally haven’t been able to
make these determinations, instead relying on tags and text descriptions to
determine what the image is.
One of the challenges of getting computers to “see,” is that
traditional programming can’t replicate something as complex as sight. But by
taking a cognitive approach, and showing a computer thousands of examples of a
particular scene, the computer can start to detect patterns that matter,
whether it’s in a scanned photograph uploaded to the web, or some video footage
taken with a camera phone.
Let’s say we wanted to teach a computer what a beach looks
like. We would start by showing the computer many examples of beach scenes. The
computer would turn those pictures into distinct features, such as color
distributions, texture patterns, edge information, or motion information in the
case of video. Then, computer would begin to learn how to discriminate beach
scenes from other scenes based on these different features. For instance, it
would learn that for a beach scene, certain color distributions are typically found,
compared to a downtown cityscape, where certain distributions of edges are what
make them distinct from other scenes.
Once the computer learns this kind of basic discrimination, we
can then go a step further and teach it about more detailed activities that
could happen within the beach scene: we could introduce a volleyball game or
surf competition at the beach. The system would continue to build on these
simpler concepts of what a beach is to the point that it may be able to
distinguish different beach scenes, or even discern a beach in France from one
in California. In essence, the machine will learn the way we do.
Helping doctors see
diseases before they occur
In the medical field where diagnoses come from MRI, X-Ray
and CT images, cognitive visual computing can play an important role in helping
doctors recognize issues such as tumors, blood clots, or other problems, sooner.
Often what's important in these images is subtle and microscopic, and require
careful measurements. Using the pattern recognition techniques described above,
a computer can be trained to effectively recognize what matters most in these
Take dermatology. Patients often have visible symptoms of
skin cancer by the time they see a doctor. By having many images of patients from
scans over time, a computer then could look for patterns and identify
situations where there may be something pre-cancerous, well before melanomas
Share a photo – get
It’s not only images from specialized devices that are
useful. The photos we share and like on social networks, such as Facebook and
Pinterest can provide many insights. By looking at the images that people share
or like on these social networks, retailers can learn about our preferences – whether
we’re sports fans, where we like to travel, or what styles of clothing we like
– to deliver more targeted promotions and offer individualized products and
getting promotions for kitchen gadgets or even certain kinds of food based on
the images pinned to your “Dream Kitchen” Pinterest board.
Using Facebook photos to save lives
photos on social networks is not only beneficial for retailers and marketers,
it could also help in emergency management situations. Photos of severe storms
– and the damage they cause, such as fires or electrical outages – uploaded to
the web could help electrical utilities and local emergency services to
determine in real time what’s happening, what the safety conditions are and where
to send crews. This same type of analysis could also be done with security
cameras within a city. By aggregating all of the video data, police datacenters
could analyze and determine possible security and safety issues.
five years, computers will be able to sense, understand, and act upon these large
volumes of visual information to help us make better decisions and gain
insights into a world they couldn’t previously decipher.
If you think cognitive systems will most-likely have the ability to see, before augmenting the other senses, vote for it, here.
IBM thinks these cognitive systems will connect to all of
our other senses. You can read more about taste, smell, hearing, and touch technology
in this year’s IBM 5 in 5.