A Tale of Two Frame Rates

I had the good fortune of attending a local SIGGRAPH chapter talk by Bruna Berford of Penrose Studio, regarding production methodology and how they approached animation in their beautiful and emotionally compelling VR experience “Alumette”. She presented a good view into the difficulties, challenges,and rewards of adapting to working in this new medium. And let me say that to my thinking, they are actually embracing the new technology fully as a narrative medium.

But that’s not what this post is about. This post is about a misconception that people in V/M/AR have around the concept of frame rate. Specifically the holy grail of a 90+ fps redraw rate. This is held up as a metric which must be achieved for a non-nauseating viewing experience and is usually stated as an across the board dictum. Alumette, however, threw a very nice wrench into that, one which points to something that I’ve tried to articulate in the past. There are two different frame rates going on here. And that difference is apparent in Penrose’s use of stop motion frame rates for its animation.

The first “frame rate” is the one is the one that’s usually meant and I think of that as being the perceptual, or maybe even proprioceptural frame rate. This is the frame rate that corresponds to how well the environment is tracking your body’s movements. For instance when you turn your head, or when you walk around in a room scale experience. This is the one that tells your lizard brain whether or not you are being lied to. But a lot of people, including seasoned veterans, stop here, assuming that the matter is settled, but I think there’s a second frame rate at work.

The second is what I would call the exterior frame rate. This is the frame rate of the displayed content. And in Alumette this was definitely not at 90+ fps. In fact it was at a consciously much “slower” and less constant frame rate because it was being animated on hard, held keys with no interpolation. This was to emphasize the poses of the animation. The result was an elegant reference to traditional stop motion animation, with all of the artistic start/stop and a wonderfully surreal sense of time. And the overall experience in VR was not so much watching a stop motion animation, but rather existing in space with one. It was pretty cool.

The “content” was running at what I night guess averaged to ~12 fps, but the display of it, and therefore more importantly my perception of the experience was at the magic 90+ fps. This is an important distinction – especially when it comes to content creation. Would 360 video at a lower playback rate, say 18 fps, give us that old Super8 home movie feel as long as the the video sphere it’s projected onto was moving seamlessly? Could a game engine environment be optimized to hold frames of animation at 30fps allowing temporally redundant data to limit draw calls or GPU memory writes?

Spherical Stereo Camera for Immersive Rendering

At SIGGRAPH this year in LA, I was really impressed by a presentation that Mach Kobayashi (currently at Google) gave at the PIXAR Renderman User’s Group. It involved how to render 360 panorama stills for viewing in VR. He pointed out that the naive approach of placing two panorama cameras side by side would break as you looked away from the plane the cameras were in because the views would cross. The stereo version of a broken clock being right twice a day.


He had a great solution involving ray-tracing the pictures from the tangent vectors of a cylinder, where the cylinder width is the inter-pupillary distance. And it works great:


But I was left wondering “what happens if you look up or down?” The answer must involve a sphere. So I gave myself a little project: ray-trace from a sphere to get the stereo result from every angle. And it almost works – like 98% works.

The basic idea is to extend a basic spherical panorama camera by adding an offset to the ray origin position without adjusting the viewing direction. The image is rendered, one pixel at a time by sweeping the “camera” horizontally in “u” and vertically in “v”. Like traveling the Earth by stepping a little to the east and snapping a one pixel picture of the sky all the way around, and then taking a step North, taking a single pixel picture and repeating. Luckily the computer is faster and the 3d scene database remains still for the shutter.


Camera assembly scans around a sphere in u and v

I have some more work to do – like trying to integrate animating live “hero” elements in with the static stereo background to see if it still holds up to the eye, but this will do for now. Here’s a link to a test that’s made for vieing in Cardboard: http://scottmsinger.com/vrar/sphcam/  The icky pinching at the top of the sky is from an attractive but non spherical sky and cloud texture.

Side by side
right eye from spherical projection stereo camera

Experimenting with particles, volumes, refractive and reflective surfaces is next. It’s promising, but tricky – the amount of filtering and pixel sampling is going to take some dialing in.

But there’s a problem. And it’s the same problem that map makers have when they try to flatten the globe into a single plane – you end up getting a mismatch of sampling distances and densities toward the poles. In my renders this results in a little “S” shaped warping as cameras are pointing too far up or down. The results are still pretty cool, and maybe good enough for many applications, but far from perfect. The other issue is that it means that just as many samples circle the sphere at the poles as at the equator, and whether you want to look at it as too much information in some places or not enough in the others – the fact is that it’s not a very efficient use of rendering resources.

So I’m working on another approach that uses geodesic spheres to derive sample points, and rotating the sphere points without rotating the normals. Oh there will be problems with that too I’m sure. But this is when I get to fall back on my MFA in painting at let all my higher math colleagues solve the problems I make for them.