Here we are in mid-2016 and the wait is over: Nearly four years after Palmer Luckey’s Kickstarter campaign for the Oculus Rift, the consumer version (along with stiff competition) is available for purchase.

The current hardware is arguably some of the most polished ever for a first-generation offering, and the new medium has attracted the talent of everyone from triple-A game developers to documentary filmmakers.

That said, the hardware is not without its compromises. While it may be blowing the minds of the dedicated VR crowd, it will need to meet a higher bar before it enjoys the kind of sales volumes we see in the gaming console or smartphone spaces. The good news is, the VR industry is already moving quickly on more advanced VR. Here are five things we can look forward to in the upcoming second-generation VR:

1. Untethered usage

While it will be a nuisance to set up and deal with external cameras in the first wave of devices, it’s not something we’ll need to live with forever. The “outside in” external camera approach will eventually be replaced by a superior “inside out” method of head tracking. “Inside out” tracking places one or more cameras on the headset itself – similar to our own eyes – that can detect motion in the external environment. Not only will this do away with the required external cameras perched around your room, it will also untether devices so that VR headsets can be used just about anywhere. Once untethered with a robust inside-out approach to head tracking, VR use will be able to expand to long car trips or plane flights where external cameras can’t be set up.

AI Weekly

The must-read newsletter for AI and Big Data industry written by Khari Johnson, Kyle Wiggers, and Seth Colaner.

Included with VentureBeat Insider and VentureBeat VIP memberships.

Ideally, the first generation of VR devices would have taken advantage of the more elegant approach of inside-out tracking; however, the limiting factor was how taxing it is from a computational standpoint. Instead of relying on a controlled set of infrared markers, devices have to figure out the orientation of the user’s head by looking at the dynamic, changing environment of the outside world.

It takes sophisticated computer vision algorithms to understand the movement of every pixel and how the head is moving through space. It’s an approach that is demonstrable today, but in the words of Oculus’ Luckey, “VR-grade inside-out tracking is not currently workable on mobile devices.” Rest assured, though, that with innovations in computer vision and ultra-low power processing, solutions are around the corner. Untethering will be a big step towards the all-important “plug and play” simplicity required for mass adoption.

2. Optimized mobile hardware

There may always be a case for a VR device tethered to a dedicated gaming PC for the very high-end graphics. But there also is a strong argument for untethered form factors that give up some graphical prowess in return for the portability and simplicity of an all-in-one form factor.

Current offerings such as Cardboard and Gear VR offer a mobile form factor, but they essentially reappropriate a smartphone in order to drive a VR experience. While the work companies have done to create VR on a device we already have in our pockets is tremendous, several compromises have to be made. Smartphone architecture has been designed around sipping on battery life, spinning up CPU resources only when necessary, then quickly returning to a low power state. VR, on the other hand, needs resources running at a sustained high level for as long as possible.

Asking a smartphone processor to run VR experiences is similar to asking a sprinter to compete in a marathon – they can probably do it, but there are better candidates available.

A purpose-built device doesn’t have the same constraints as a smartphone in terms of form factor, heat dissipation, or battery size. A phone’s components must be shoehorned into a 7mm thick slab of glass and metal, while components in a VR headset can be laid out in a much more appropriate manner to enable longer sustained performance, better cooling, and improved weight distribution. On top of this, dedicated processing solutions will alleviate the VR workload from these systems. As a result, we may not have to give up as much of the high-end experience as we think.

Standalone headsets will likely fall somewhere between Gear VR performance and that of a high-end PC. Will this be enough to satisfy the PC Master Race? Doubtful, but it may be a level of polish that is good enough to appeal to the critical mass of casual users who aren’t willing to invest in a gaming PC.

3. More convincing 3D

The current generation of VR headsets relies on a technique known as stereoscopy to deliver the sensation of three dimensions. In short, this technique involves displaying a slightly different perspective of a scene to each eye, from which the visual system then extracts depth. This is the same way that the human visual system recognizes depth in the real world – it’s the reason our eyes are spaced apart to provide two ever-so-slightly different perspectives of the same scene.

Stereoscopy produces a very convincing illusion of three dimensions from a two dimensional display. That said, stereoscopy isn’t the only way that the human brain perceives depth. In fact, if you’ve ever “lost” the feeling of depth halfway through a 3D movie, it’s likely because your brain isn’t receiving all the depth cues it needs to truly believe it’s not looking at a flat cinema screen. The human visual system relies on at least 18 different cues to perceive depth.

Some of these cues are obvious. Occlusion and parallax have been used to convey a sense of depth for a long time, but there are other cues that are much more difficult to reproduce. Take for example, the convergence cue, where the kinesthetic sensations of eye muscles focusing at various distances is combined to understand depth. This is a hugely challenging task to replicate in VR. How can we adjust a fixed screen to accommodate the user focusing near or far away within the virtual scene?

Thankfully, the answer appears to lay in eye-tracking technology. Various companies, such as Eyefluence, EyeTribe, and SMI, seem to be working on systems that can detect depth of focus. With this information, a VR system can then rerender a scene according to how far the user is looking into it. What results is the kinesthetic movement of the eye muscles matching the virtual movement of the display. If that’s hard to visualize, hold a finger four inches from your nose, then practice bringing your finger into focus, followed by your computer screen. It’s currently not possible to do this in VR – the eyes are focused on the fixed plane of the display situated a couple of inches from the eyes.

Allowing the eyes to freely focus on objects near and far will bring about a whole new sense of immersion.

4. No-compromise gesture detection

While many VR experiences will be fantastic with the facilitation of a controller, “lightgun,” or gaming wheel, a large number of experiences will still benefit from hand input. Natural human gestures, such as waving, pointing, and grabbing, make a great deal of intuitive sense in VR. The current generation of devices do not natively support gesture detection, despite a number of players existing on the market. The reason: A number of challenges exist for gesture.

Firstly, the human brain is extremely sensitive to the articulation of our hands versus how they appear in our visual field. The nervous system has a class of receptors known as proprioceptors that can detect the position and rotation of joints, and the hand is full of them. This is why we know the position and articulation of our hands even when our eyes are closed. The challenge in VR is that this finely tuned positional system must match up to what is being displayed in the virtual world. Any offset may cause the brain to reject the representation, or contribute to a sense of sea sickness.

Gesture systems then need to be both extremely accurate and extremely fast to trick the brain into thinking it’s looking at its own hands rather than virtual ones. Luckily, there are a number of companies today working on high-accuracy, low-latency gesture solutions, and by the time next-gen VR headsets hit the shelves, there’s a very high chance at least some of them will come with gesture control right out of the box.

5. The “social sofa”

Current VR devices work well for closing off the real world and replacing it with a virtual one, but we have to remember that the real world still exists around the user, including obstacles, pets, and siblings. This represents a challenge for VR to become more social and more interactive.

Companies like Sony are taking creative approaches to maintain the “social sofa” of multiplayer experiences with asymmetrical VR/traditional game design, but ultimately the VR user is still closed off from others.

Video pass-through modes, as demonstrated by HTC a few months ago, could be used not only when the user is approaching an obstruction in the room, but also when other people are trying to get your attention or play alongside you, whether or not they have also donned a headset. Outside participants can be accurately rendered into a scene as avatars, allowing new ways to mix reality with VR. Optitrack demonstrated a quick taste of this in a spectacular fashion during GDC this year.

Increasing the level of social gaming both on the sofa and across locations will be an important component for creating sustained experiences, adding a replayability factor not seen in canned, linear VR experiences that make up a large proportion of initial content.

A promising future ahead

The features above are not a wishlist. They are all technologies that are arguably ready to be implemented in next-generation devices. These features will improve incrementally on the already impressive experiences that have created so much buzz about VR.

While VR 1.0 is off to an impressive start with a new frontier of content to explore, it’s heartening to know that we’re only just getting started with this medium. VR is where the smartphone was in 2007; hitting a key milestone by getting viable hardware to market — but only a preview of how ubiquitous the technology will become.

Remi El-Ouazzane is CEO of Movidius, a startup that combines algorithms with custom hardware to provide visual intelligence to connected devices.

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn More