Computer vision and image recognition are integral parts of artificial intelligence (AI), which has quickly gone from niche to mainstream in the past few years. And nowhere was this more evident than at CES 2017 earlier this month. From a few days of wandering the floor, here are some of the coolest new uses of computer vision.
1. Self-driving cars
The biggest displays of computer vision are coming from the automotive industry, because computer vision, after all, is one of the central enabling technologies of semi- and fully-autonomous cars. NVIDIA, which already helped supercharge the deep learning revolution with its deep learning GPU tools, is powering many of the autonomous car innovations with the NVIDIA Drive PX 2, a self-driving car reference platform that Tesla, Volvo, Audi, BMW, and Mercedes-Benz are already using for semi- and fully-autonomous functions. Its DriveNet AI perception technology features neural-network-trained computer vision and other networks – in this case everything from lidar and radar to ultrasonic sensors and multiple cameras — that can perceive your environment, lane markings, vehicles, and more. Coupled with the new Xavier AI car supercomputer, NVIDIA and Audi plan to build and put a fully autonomous car on the road by 2020.
[aditude-amp id="flyingcarpet" targeting='{"env":"staging","page_type":"article","post_id":2159764,"post_type":"guest","post_chan":"none","tags":null,"ai":false,"category":"none","all_categories":"business,","session":"A"}']More real-world solutions are available from NVIDIA AI Co Pilot, which uses face recognition, lip reading, gaze tracking, and natural language to make assisted driving better. So, for example, computer vision-enabled lip reading is used to enhance natural language recognition, while gaze tracking (a mix of eye-, face-, and head-tracking) doesn’t just inform drivers when they’re dozing but also picks up hard-to-see situations like a motorcycle suddenly approaching between lanes behind you. In the case of lip reading, deep learning networks are now capable of reading lips with 95 percent accuracy, versus humans with only three percent (hello, HAL 9000). It’s a superhuman power, for now being used to make voice recognition just a little bit more accurate if, say, it’s too noisy in a car. These are subsets of AI working together.
2. Personalization
In the future, custom car settings will get a whole lot better thanks to facial recognition. The Panasonic Chrysler Portal concept features cameras behind the steering wheel and outside the car that use computer vision to immediately recognize the driver, even before they get to the vehicle, and updates music preferences, seating positioning, temperature, and so on. And it’s not just the driver that’s “recognized,” since passengers, too, can have face-recognition-enabled personalization settings – seating, temperature, even noise-cancellation “cocoons” blasting individual music preferences — that would update automatically. This is cool for personal cars, of course, but imagine how it might transform ride-sharing services like Uber or Lyft.
AI Weekly
The must-read newsletter for AI and Big Data industry written by Khari Johnson, Kyle Wiggers, and Seth Colaner.
Included with VentureBeat Insider and VentureBeat VIP memberships.
3. Interfaces
Computer vision-powered eye-tracking technology has moved beyond gaming laptops into consumer and business computers, providing control for users who can’t use their hands. The Tobii Dynavox PCEye Mini featuring IS4 is about the size of a ballpoint pen, making it an ideal and unobtrusive accessory to tablets, laptops, and more (some new Asus gaming and consumer laptops, as well as Huawei smartphones, feature Tobii’s eye-tracking technology). [Correction: Tobii did not announce a partnership with Asus at CES; it announced details on its existing relationship with Acer for eye tracking, including integration into the new Predator gaming line. Additionally Tobii is integrated into hardware with Alienware, including the new Alienware 17 gaming notebook.]
Meanwhile, gesture control, which uses computer vision to track specific hand movements, continues to expand, particularly in cars – both BMW and VW are putting it into future models. The latter’s HoloActive Touch interface, whereby users control virtual 3D screens and buttons in the space in front of the dashboard, is a very basic version of Ironman’s hologram interface made real (thanks especially to the included haptic feedback). And the prospect of gesture recognition scaling and easily making its way to any device is further aided by solutions such as ManoMotion, which can bring gesture and 3D virtual object control to any device with a regular 2D camera with no need for hardware upgrades. And like a Tobii PCEye Mini for gesture control, eyeSight’s Singlecue Gen 2 uses computer vision — gesture recognition, facial analysis, action recognition, and more — to control everything from TVs and cable boxes to smart lights and thermostats.
IndieGoGo-funded Hayo may be the most intriguing new interface of all. It’s a solution that lets you create virtual controls around your house – a volume control in mid-air you can adjust just by raising your hand up or down, a light switch on a kitchen counter-top – that are activated thanks to a computer vision-enabled cylindrical device with a built-in camera and 3D-, infrared-, and motion-sensors.
4. Appliances
Expensive refrigerator cams that simply show you video footage of what’s in your refrigerator aren’t that revolutionary, but retrofitting your old refrigerator with an after-market camera and app that uses image recognition to tell you when you’re running out of certain items is truly game-changing. Besides blasting the straight food feed via streaming video to your smartphone, the sleek Smarter FridgeCam, which attaches to the back wall inside your refrigerator, uses image recognition to detect expiration dates and tell you what’s in your fridge (not to mention recommend recipes based on said foodstuffs). At $100, it’s shockingly accessible.
5. Digital signage
A form of computer vision shows potential to transform banner and other ads in public spaces such as retail stores, museums, stadiums, and theme parks. Panasonic’s booth demoed a flag mapping technology that uses a technique called dynamic projection mapping, which uses infrared markers invisible to the human eye, as well as video stabilization to help ads projected onto hanging banners move realistically even when the flags are flapping in the wind, as if the ads were actually printed onto the banners.
6. Smartphones and AR
Much has been written about Pokémon Go being the world’s first mass-market augmented reality (AR) app, but like other apps that hop on the AR bandwagon, it’s based mainly on GPS and triangulation to achieve its sense of what’s in front of you. Real computer vision technology has been mostly absent from smartphones. But in November, Lenovo released the Phab2, the first phone to use Google’s Tango, which employs a mix of sensors and computer vision software to see what’s in pictures, videos, and the real world in real time through the camera lens. At CES, Asus debuted the ZenPhone AR, which is the first Tango-enabled smartphone to also have Google’s Daydream VR capability. Besides motion tracking, depth perception, and exact positioning capability, the phone’s features are also made possible by its Qualcomm Snapdragon 821 processor, which allows for distributed computer vision workloads. All this adds up to real, computer vision-enabled AR that’s truly based on what’s on your phone’s camera screen in real time.
[aditude-amp id="medium1" targeting='{"env":"staging","page_type":"article","post_id":2159764,"post_type":"guest","post_chan":"none","tags":null,"ai":false,"category":"none","all_categories":"business,","session":"A"}']
Meanwhile, the Changhong H2, due out later this year, is the first smartphone that has a built-in material sensor. Powered by the same technology used in Consumer Physics’ SCiO pocket molecular sensor, the phone picks up light that, according an explainer video on the SCiO website, is “reflected back from an object, breaks it down into a spectrum, and then analyzes its chemical makeup.” Paired with deep learning-trained software, this information can then be used for everything from authenticating pills and counting calories in food to identifying skin conditions and calculating body fat levels.
7. Cameras
Central to any computer vision platform is the camera, and plenty of new cameras and camera features are getting smarter and offering new work functions beyond the capability of humans. FLiR Systems has released several new thermal imaging cameras that expand heat-sensing computer vision to new areas. Looking just like any GoPro or other action cam, the FLIR Duo and Duo R, for example, can be attached to any drone and used to track heat in all kinds of consumer and business situations, from spotting insulation leaks on a homeowner’s roof to aerial surveying of crops or oil fields. This camera would come in handy with a software solution such as that offered by Birds.ai, a Netherlands-based startup that specializes in managing and analyzing aerial images, allowing for everything from tracking the location and number of crops to wind turbine and power line defect identification.
8. Robots
Robots are part mechanical challenge and wonder and part artificial intelligence. And after years of being in the same boat as the flying car as an unfulfilled sci-fi promise of the 20th century, robots are starting to make some headway in both consumer and business markets. Even simple Amazon Alexa and Google Home voice control and personal digital assistant-enabled consumer robots such as the LG Hub and the Mayfield Robotics Kuri now have basic forms of computer vision so that they can, say, recognize who they’re interacting with or rat out your dog if it hops on the couch while you’re away. On a more serious note, the ITRI’s Intelligent Vision System uses deep learning and computer vision to enable robots to identify different-sized objects (game figures, coffee cups) and their locations so they can grab and move them around – an indispensable skill to do everything from bussing restaurant tables to keeping the elderly company over a game of chess.
Ken Weiner is CTO at GumGum.
[aditude-amp id="medium2" targeting='{"env":"staging","page_type":"article","post_id":2159764,"post_type":"guest","post_chan":"none","tags":null,"ai":false,"category":"none","all_categories":"business,","session":"A"}']
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn More