An Old Map and Declarative 3D

This is an early  “work in progress” visualization of an 18th century map drawn by Giambattista Nolli, using @A-Frame declarative HTML mark-up extensions for VR/3D in WebGL – with procedurally generated geometry and baked lighting in Houdini. Lots more to do and learn. Eventually it will be part of an AR promo piece but I couldn’t resist.


(better navigation in cardboard, varied building heights, and better global illumination)


A Tale of Two Frame Rates

I had the good fortune of attending a local SIGGRAPH chapter talk by Bruna Berford of Penrose Studio, regarding production methodology and how they approached animation in their beautiful and emotionally compelling VR experience “Alumette”. She presented a good view into the difficulties, challenges,and rewards of adapting to working in this new medium. And let me say that to my thinking, they are actually embracing the new technology fully as a narrative medium.

But that’s not what this post is about. This post is about a misconception that people in V/M/AR have around the concept of frame rate. Specifically the holy grail of a 90+ fps redraw rate. This is held up as a metric which must be achieved for a non-nauseating viewing experience and is usually stated as an across the board dictum. Alumette, however, threw a very nice wrench into that, one which points to something that I’ve tried to articulate in the past. There are two different frame rates going on here. And that difference is apparent in Penrose’s use of stop motion frame rates for its animation.

The first “frame rate” is the one is the one that’s usually meant and I think of that as being the perceptual, or maybe even proprioceptural frame rate. This is the frame rate that corresponds to how well the environment is tracking your body’s movements. For instance when you turn your head, or when you walk around in a room scale experience. This is the one that tells your lizard brain whether or not you are being lied to. But a lot of people, including seasoned veterans, stop here, assuming that the matter is settled, but I think there’s a second frame rate at work.

The second is what I would call the exterior frame rate. This is the frame rate of the displayed content. And in Alumette this was definitely not at 90+ fps. In fact it was at a consciously much “slower” and less constant frame rate because it was being animated on hard, held keys with no interpolation. This was to emphasize the poses of the animation. The result was an elegant reference to traditional stop motion animation, with all of the artistic start/stop and a wonderfully surreal sense of time. And the overall experience in VR was not so much watching a stop motion animation, but rather existing in space with one. It was pretty cool.

The “content” was running at what I night guess averaged to ~12 fps, but the display of it, and therefore more importantly my perception of the experience was at the magic 90+ fps. This is an important distinction – especially when it comes to content creation. Would 360 video at a lower playback rate, say 18 fps, give us that old Super8 home movie feel as long as the the video sphere it’s projected onto was moving seamlessly? Could a game engine environment be optimized to hold frames of animation at 30fps allowing temporally redundant data to limit draw calls or GPU memory writes?

Who will make the merchandise display cases for VR shopping?

Many are convinced that simultaneous, shared, social experiences in VR and other 3D immersive modalities are a foregone conclusion. Regardless of how deluded we might be in this, one thing becomes clear – in order for this to scale, we will need to have a consistent way of describing all of the stuff – much like how molecules are a consistent way of describing the real world. Luckily the virtual world is many orders of magnitude simpler than the actual physical world, and instead of the uncountable trillions of sub particle level interactions of matter, the virtual world needs only a truly astounding level of trackable events through a potentially manageable number of protocols and standards.

The problem is that even at many orders of magnitude simpler, the task of how to consistently describe “anything” so that we can share it, sell it, buy it, travel to it, hold it, toss it back and forth, etc. is still really, amazingly complicated. Much more complicated than say – the choice of game engine d’jour, OBJ or FBX or Collada, or whether or not you have a cool physics engine. But what really are the basics of virtual matter that need description so that they can be manipulated in the ways we expect? I was thinking about this and came up with a functional, if prosaic example to get me into a more pragmatic frame of mind than say – blasting space zombie outlaws.

Let’s assume we have simultaneous social immersive 3D experiences delivered over a common framework. And let’s say that within that space there are millions of stores. And in many of these stores is the virtual equivalent of a merchandise display case. And let’s say your company makes display cases for virtual environments. There are a lot of assumptions here for sure, and the “display case” here is really just a conceptual placeholder for whatever the virtual world might offer up as a kind of “durable good”. But let’s put all that aside for the moment and assume that your business is making virtual display cabinets.

In the real world, display cabinets have certain features that make them more suited for some purposes than others, and yours are very good and specialized. In your case they are jewelry cases that have buttons on top that let a shopper rotate the shelves around forward and back. You make high end cabinets that are very durable and come in standard sizes that fit in with other leading retail fixture manufacturers’ products. The doors operate smoothly allowing ample access for the sales associate to quickly retrieve even the most tiny items the customer might want. When a business orders cabinets from you, they pay for them, you ship them out, they are installed and exist physically in place. No one can really duplicate them beyond manufacturing a knock off product.

In a virtual world retail businesses will want display cabinets, and just like in the real world they won’t normally want to design and manufacture them themselves. They will expect to buy them and for them to just simply work. Customers will be able to easily peruse their options and make their choices. They may want to try things on, see how they match the color of their eyes before they buy them. Your display cases will have to use the same “trying on” mechanisms that the rest of the display cases in the store do, because the store will want to support the latest most accurate shopping reality capture avatar system available. Your display case needs to be installable within the store’s inventory control scheme, but also installable within the stores local cartesian coordinate frame. It needs to be addressable within their asset management system so that stock changes and merchandising decisions can be pushed to the cabinets from central databases. Your cabinets will need to be backward compatible with this stores stock and inventory system which is several versions out of date because “they like their system fine the way it is”, and they are a big customer so you need to keep their business.

And so let’s say now you’ve managed to make a a future proof, universally accessible and addressable, fully inter-functional display cabinet, backward compatible with old virtual mercantile standards, with compliant e-commerce security features, but you still have another issue. How do you make sure that the store isn’t making copies of your display cabinets and using them across all their wholly owned subsidiaries? Or selling them overseas to offset a flat Christmas sales season? Or being stolen by a nefarious shopper and resold on the lucrative display cabinet black market?

This is where it’s all about standards. All about the protocols that set out the expected behaviors and configurations that define and prescribe how all of these magical virtual interactions happen. It’s the subatomic glue that connects all the disparate experiences into coherent, navigable places, and continues to do so after the cowboys and star fighters have all gone home.

Spherical Stereo Camera for Immersive Rendering

At SIGGRAPH this year in LA, I was really impressed by a presentation that Mach Kobayashi (currently at Google) gave at the PIXAR Renderman User’s Group. It involved how to render 360 panorama stills for viewing in VR. He pointed out that the naive approach of placing two panorama cameras side by side would break as you looked away from the plane the cameras were in because the views would cross. The stereo version of a broken clock being right twice a day.


He had a great solution involving ray-tracing the pictures from the tangent vectors of a cylinder, where the cylinder width is the inter-pupillary distance. And it works great:

But I was left wondering “what happens if you look up or down?” The answer must involve a sphere. So I gave myself a little project: ray-trace from a sphere to get the stereo result from every angle. And it almost works – like 98% works.

The basic idea is to extend a basic spherical panorama camera by adding an offset to the ray origin position without adjusting the viewing direction. The image is rendered, one pixel at a time by sweeping the “camera” horizontally in “u” and vertically in “v”. Like traveling the Earth by stepping a little to the east and snapping a one pixel picture of the sky all the way around, and then taking a step North, taking a single pixel picture and repeating. Luckily the computer is faster and the 3d scene database remains still for the shutter.


Camera assembly scans around a sphere in u and v

I have some more work to do – like trying to integrate animating live “hero” elements in with the static stereo background to see if it still holds up to the eye, but this will do for now. Here’s a link to a test that’s made for vieing in Cardboard:  The icky pinching at the top of the sky is from an attractive but non spherical sky and cloud texture.

Side by side
right eye from spherical projection stereo camera

Experimenting with particles, volumes, refractive and reflective surfaces is next. It’s promising, but tricky – the amount of filtering and pixel sampling is going to take some dialing in.

But there’s a problem. And it’s the same problem that map makers have when they try to flatten the globe into a single plane – you end up getting a mismatch of sampling distances and densities toward the poles. In my renders this results in a little “S” shaped warping as cameras are pointing too far up or down. The results are still pretty cool, and maybe good enough for many applications, but far from perfect. The other issue is that it means that just as many samples circle the sphere at the poles as at the equator, and whether you want to look at it as too much information in some places or not enough in the others – the fact is that it’s not a very efficient use of rendering resources.

So I’m working on another approach that uses geodesic spheres to derive sample points, and rotating the sphere points without rotating the normals. Oh there will be problems with that too I’m sure. But this is when I get to fall back on my MFA in painting at let all my higher math colleagues solve the problems I make for them.




Freeing Immersive Content Creators from App Trap

One of the biggest hurdles facing anyone wanting to deliver AR/VR content right now is that every different implementation requires a different packaging of content data. Some of this is a result of the “game” and “app” ecosystems that these experiences come from, but there’s also no other alternative.

Content cannot be delivered as a broadcast stream because there is no definition of what that stream is. Without that there is no standard viewing “environment” to leverage. There are some attempts to work on this – YouTube’s 360 video is an interesting way of delivering one component of immersive content, but it’s not an extensible or leverageable technology. It’s essentially only a movie player. A content creator cannot, for instance, embed a 360 video as one of many elements in a deliverable program.

And so content creators also have to be technologists capable of building worlds of mixed elements inside of an app or game metaphor. Each experience is a one-off, individually crafted delivery of heterogenous content. But most of this content is really just reconfigured instances of only a handful of different kinds of data – 2d, 3d, static, animated, geometry, images, navigable, etc. And this repetition could be exploited into not only a consistent data exchange “format”, but also a consistent experience environment. A content provider would construct, not an app or game, but a container of elements and descriptors, deliverable as a “unit” to any compliant experience environment. Like a broadcast network delivered TV shows, bounced off satellites, thrown across the airwaves or down cables to a TV set that decoded and displayed the experience.

But what would that package look like? How can we all agree? What are the NTSC, mpeg, jpeg, obj, wav of VR? Is it a file? Is it a file aggregation container? There are a lot of questions to answer, but the freedom afforded to content creators when they no longer have to worry about he technology of the viewing experience, could bring the freedom that other creators have had for years. Film makers don’t have to worry about the inner mechanical workings of projectors, writers don’t have to worry about how printing presses work, and AMVR content creators should not have to worry about writing apps.

The Late 1940s Black and White TV of Virtual Reality Experiences

Everyone seems to be chasing some pretty lofty production goals in VR right now – fully immersive 360 cinematic visual experiences, with full body tracking and gestural input – and that’s great. It’s like the ultimate mind bending experience. But it’s missing a bigger, more achievable, and more deliverable alternative which is a lot more like black and white TV of the late 40s.

It’s not a sexy as the hard wired, high octane, dedicated immersive pipeline experience of an 8K surround, best seat in the house concert experience, or the subtly expressive and captivating world of an elegantly rendered narrative, but it’s deliverable, right now, and on cardboard or a simple smartphone.

If we let go of designing for the future hardware utopia – no not all of us, and certainly not all of the time – we can make experiences that we can deliver right now. How captivating they are will be based on how well the inherent limitations are embraced and become part of the experiences themselves. It’s like the $9.95 sculpture in design class – what’s the best sculpture you can make for $9.95? Not what’s the best approximation of $9,999 dollar sculpture you could have made if the assignment weren’t so damn frustrating, and not the $0.99 sculpture – you get no points for false economies. But the best that you can do while fully embracing the limitation of $9.99.

What can we do with limited resolution, limited bandwidth, limited tracking, limited capture? Can we make a simple experience that can be immersive, but not stereo? Can a viewer go to a web page, hold up their smart phone and be inside an engaging experience? What are the experiences that lend themselves most to these design constraints? News? Documentary? Sports? Conversations? Simple telepresence? Standup comedy? Variety shows? We are not at the readily available 8k video experience of VR yet; we aren’t even at the readily available Color TV NTSC 1950s experience of VR yet. How do we design compelling experiences for what we do have. There were compelling things on TV when it was black and white, on a tiny round screen, and the image was mostly ghosted, solarized, and smeared. Maybe people were just smarter in the 40s.