Why machine vision is the next frontier for AI

The buzz around artificial intelligence, or AI, has been growing strong over the past year. We’ve never been closer to unlocking the benefits of this technology. 2016 will see new kinds of AI-powered devices as we make progress on one of the most difficult challenges in AI: getting our devices to understand what they are seeing.

Why would a machine need to see? Vision is a primary sense and one of the main mediums in which we live our lives. In order for machines to be able to relate to humans and provide the support we need, it is imperative they can observe and behave in the visual realm. This can be in the form of a small camera that helps a blind person “see” and contextualize the world around them or a home surveillance system that can correctly identify the difference between a stray cat, moving tree branches outside, and a burglar.

As devices play a progressively integral part in our daily lives, we have seen an increasing number of applications fail without adequate visual capabilities, including a myriad of midair drone collisions and robot vacuums that “eat” things they shouldn’t.

Machine vision, a rapidly growing branch of AI that aims to give machines sight comparable to our own, has made massive strides over the past few years thanks to researchers applying specialized neural networks to help machines identify and understand images from the real world. From that starting point in 2012, computers are now capable of doing everything from identifying cats on the Internet to recognizing specific faces in a sea of photos, but there is still a ways to go. Today, we’re seeing machine vision leave the data center and be applied to everything from autonomous drones to sorting our food.

A common analogy to understanding machine vision versus our own can be found when comparing the flight of birds to that of airplanes. Both will ultimately rely on fundamental physics (e.g. Bernoulli’s Principle) to help lift them into the air, but that doesn’t mean a plane will flap its wings to fly. Just because people and machines may see the same things and the way those images are interpreted may even have some commonalties, the final results can still be vastly different.

While basic image classification has become far easier, when it comes to extracting meaning or information from abstract scenes, machines face a whole new set of problems. Optical illusions are a great example of how far machine vision still has to go.

Everyone is probably familiar with the classic illusion of two silhouettes facing one another. When a person looks at this image, they aren’t limited to just seeing abstract shapes. Their brain inserts further context, allowing them to identify multiple parts of the image, seeing two faces or a vase, all from the same image.

When we run this same image through a classifier (you can find several free ones on the Internet), we quickly realize how hard this is for a machine to understand. A basic classifier doesn’t see two faces or a vase, instead, it sees things like a hatchet, hook, bulletproof vest, and even an acoustic guitar. While the system is admittedly uncertain any of those things are actually in the image, it shows just how challenging this can become.

This problem becomes even more difficult if we look at something more complicated, like a painting from Beverly Doolittle. While everyone who sees this image may not be able to spot every face on the canvas, they will almost instantly see there is more to the picture than meets the eye.

Running this image through the same classifier, our results run the gamut from something like a valley or a stone wall to the completely off-base Grifola Frondosa (a type of mushroom) or an African chameleon. While the classifier can understand the general sense of the image, it fails to see the hidden faces within the picture.

To understand why this is such a challenge, you need to consider why vision is so complex. Just like these images, the world is a messy place. Navigating the world isn’t as simple as building an algorithm to parse through data, it requires experience and understanding of real situations that allow us to act accordingly.

Robots and drones face a myriad of these obstacles that may be out of the norm, and figuring out how to overcome these challenges is a priority for those looking to capitalize on the AI revolution.

With the continued adoption of technologies like neural networks and specialized machine vision hardware, we are rapidly closing the gap between human and machine vision. One day soon, we may even start to see robots with visual capabilities going above and beyond our own, enabling them to carry out numerous complex tasks and operate completely autonomously within our society.

Remi El-Ouazzane is CEO of Movidius, a startup combining algorithms with custom hardware to provide visual intelligence to connected devices.

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn More

The insights you need without the noise