Nokia has been searching for new businesses to break into ever since it retreated from the smartphone business. And after a few years of research, the Finnish company decided to move into virtual reality 360-degree capture cameras.
The company launched its groundbreaking Ozo in March for $60,000, and then it cut the price to $45,000 in August. It is now shipping the devices in a number of markets, and it is rolling out software and services to stoke the fledgling market for VR cameras.
We talked with Guido Voltolina, head of presence capture Ozo at Nokia Technologies, at the company’s research facility in Silicon Valley in Sunnyvale, California. Voltolina talked about the advantage the Ozo has in capturing and processing a lot of data at once, and he talked about the company’s plans for expansion in VR.
Here’s an edited transcript of our interview.
AI Weekly
The must-read newsletter for AI and Big Data industry written by Khari Johnson, Kyle Wiggers, and Seth Colaner.
Included with VentureBeat Insider and VentureBeat VIP memberships.
VentureBeat: Tell me why you moved into making the Ozo VR cameras.
Guido Voltolina: The whole project and division is called Presence Capture. The idea is that, as soon as we identified VR was coming — this was before the Oculus acquisition by Facebook — it was clear that one part of VR would be computer-generated experiences, games being the major example. But as we looked at it, we said, “Wait a minute. If this is a new medium, there will be more than just computer-generated experiences. People will want to capture something — themselves, their life, things happening in the world.”
We had to look at what would be the device that could capture as much data as possible in order to reproduce the sense of presence that VR allows you to have when you’re fully immersed. As a subset of VR, you also have 2D 360 images. That’s still happening. But that’s almost a side effect of solving the major problem we have to solve, these full three-dimensional audiovisual experiences that reproduce the sense of “being there.”
The team started thinking about a device purpose-built for that task. Instead of duct-taping different existing cameras into a rig — many people have done that — we designed a device specifically for the job. The Ozo is not a good 2D camera, but it’s an excellent VR camera. The shape ended up being the same as a skull, very similar dimensions, with the same interocular distance as a human being. It has eight cameras, and the distance is very close, with a huge overlap in the lens field of course. We’re capturing two layers of pixels to feed the right and left eye with the exact interocular distance you’d have yourself. Many rigs have a much wider distance. That creates a problem with objects that are very close to you in VR. The disparity is too great.
With this solution, we then integrated eight microphones, so the spatial audio is captured as the scene is happening. When I’m talking to you here, I have no reason to turn around. In most cases, the only reason we’d turn around is if we heard a loud sound, say from over in that corner. We’re very good at turning exactly at the angle that we thought the sound was coming from, even though we don’t have eyes in the back of our heads. Our ears are very good at perceiving the direction of sound. We integrated both 3D audio and 3D video because the full immersive experience needs both. We’re rarely moved to look around by an object moving around us. The best cue is always sound.
The way 2D movies tell you a story, they know you’re looking at the screen, and they can cut to a different image on the screen as they go, or zoom in and out as a conversation goes back and forth. In VR the audio is the part that has to make you turn to look at someone else or something else.
The concept is capturing live events. People can go to a place that’s normally not accessible to them for whatever reasons — financial reasons, distance, or maybe it doesn’t exist anymore. If something goes crazy and the pyramids in Egypt are destroyed, we’ll never see them again. But if there’s a VR experience of the pyramids, it would be like walking around and seeing the real thing. You can think of it like a time machine aimed at the past. You capture events and then you can go back and revisit them. In 20 years your son or daughter could revisit today’s Thanksgiving dinner, exactly as you did.
VB: Why is this a good move?
Voltolina: It’s very similar to what happened with pictures and video. The first black and white photographs were only accessible to a few. Wealthy people would have family pictures once a year. Now we all have a high-resolution camera in our phones. Video came along and people would hire someone to film a wedding, maybe. Then VHS and digital cameras arrived. But the one doesn’t replace the other. Pictures didn’t replace words and video didn’t replace pictures. We still text. We still share pictures. We still post YouTube videos. Different media for different things.
VR is just another medium. Being a new medium, we focus on how to capture real life in VR. With that, we also have to consider the technology related to carrying and distributing data for playback. After the Ozo we created Ozo Live and Ozo Player. These are software packages we license to companies in order for them to build their own VR players with a higher quality, or to live stream the signal that’s captured by multiple Ozo cameras.
We were at the Austin City Limits concert, for example. A production company there had, I believe, eight Ozos distributed in various positions around the stage. It’s not just one camera. That’s what we were trying at the beginning — the front-row experience, which is great — but I want to go to places I can’t normally access, right? I want to be on stage up there next to Mick Jagger or whoever. I can squeeze thousands of people up there next to him now. In real life, you just couldn’t do that, no matter how much you pay.
VB: How does it differ from the other 360 cameras out there? Facebook showed off a design for one as well.
Voltolina: The majority of the solutions you see announced are a combination of multiple camera modules. Either they have SSD cards or cables. But there’s one SSD card or one cable per camera. If a camera has 25 modules you’ll have 25 SSD cards. When you shoot, you don’t really see what you’re shooting through the camera. Then you have to export all the data, stitch it together, and see what comes out.
One of the big differences with Ozo is that, yes, there are eight cameras synchronized together, but we created a brain that takes all this data and combines it in real time. Ozo’s output is one single cable going into either your storage or a head-mounted display. You can visualize what the camera is seeing and direct from its point of view in real time. It’s like a normal viewfinder. For VR cameras, to be able to see what the camera is shooting in real time is key differentiator.
The other key characteristic is that it can operate as a self-contained device with a battery and just one internal SSD card. You can mount it on a drone, on a car, in different situations where you need flexibility and the size has to be compact. It’s about the size of a human head. The unobtrusive design is a big advantage. Some of these rigs with 16 or 25 cameras become quite invasive.
If you want to capture multiple points of view — let’s say you have a rig with 16 cameras, even small ones like GoPros. What if you need seven of those? What if you need to assemble a hundred and some cameras? One of them might malfunction or fail to synchronize or something. Once you start demanding large numbers of cameras, the delta becomes significant.