Some words on accuracy

76250_micrometer_lg

I consult and advise on a variety of topics from visual effects and animation, to A/M/VR, to IoT and cloud processing, and one thing comes up again and again – “The solution needs to be as accurate as it can be.” Whenever I come across this, I’m reminded of something that happened a long time ago, in my pre-digital life, how accurate is “as accurate as it can be”?

In 1982 after I had graduated high school, I went to guitar repair school in Springhill, TN – then a sleepy, rural town outside of Nashville. I learned a few things there: how to get into Nashville bars at 17, rosewood poisoning makes you really sick, and that accuracy is a function of, well … function.

The instructor gave the students little custom made rulers with all kinds of nifty guitar repair measurements on them for fret spacing, fingerboard radii, etc. They were custom made by a machinist friend of his, and he had a little story that went with them. When he asked his friend how much the rulers would cost, his friend asked him “How accurate do they need to be?”. He said “As accurate as you can make them?”. His friend replied “Then each one would cost $10,000.”, to which my instructor then asked: “How accurate is $20 worth of accurate?” His friend said “Probably accurate enough.”

That was 1982, and I’m not sure if $20 worth of accuracy is as accurate anymore. But the lesson taught us was that when repairing something, one must first ask about the needs of the repair. Is it a crappy guitar where replacing the frets would cost more than the price of an entire, better instrument? Or is it a priceless old Martin that needs to have the bridge reattached with meticulous care? Each requires the best work you can do, but the best work you can do for each, will be different.

IoT and AR rely heavily on GPS to get location data. GPS is (to most people) surprisingly inaccurate – only to within 10 meters, and it can also be very noisy. But in many cases it’s accurate enough – especially after filtering. We can, for instance, track the fleet of trucks through the city, or hold up our phones and see that the bar with the best happy hour is “over there” through the heads up display beer finder app. Now, if we need that app to replace the bar’s sign with a 3D list of drink specials which tracks flawlessly to it, we would require a different, more accurate location tracking technique.

It’s more than just a matter of choosing the right tools for the job however, it’s also scaling the approach of solutions to match the needs of the tasks at hand. “How accurate do they need to be?” $20 worth of accurate? 30 feet worth of accurate? Animated GIF accurate? Substitute your reality with another accurate? Escape the “uncanny valley” accurate? Or maybe 1 second of accurate? 3 minutes of accurate? The Nyquist theorem tells us that to accurately digitize an analog sound, we need to sample it at over twice the frequency of the highest frequency of the signal we want to represent. Before launching into a solution try asking yourself: “What is the metaphorical ‘nyquist frequency’ of the problem you’re trying to solve?”, and build the solution accordingly. Of course the interesting question becomes: “What is the nyquist frequency of repairing a Martin guitar?”

Advertisements

Film grain, persistence of vision, sensor tubes, and experienced resolution.

At various times in my career I’ve had to think about image resolution, film grain, how to pull a clean image from a noisy video, and orbiting gamma-ray observatory sensor tubes. The question is: how much digital resolution is needed to represent the analog resolution of motion picture film? Or maybe it’s more of a statement: How to problematize the question of – “how much digital resolution is needed to represent the analog resolution of motion picture film?”, or at least have fun thinking about it.

When I worked at the visualization lab at UCR, in the dark ages, I remember a physicist talking about a project he was working on for an orbiting gamma-ray observatory that was going to map background radiation from outer space. They wanted to do this at a very high resolution. The problem was that the gamma particle sensor tubes they used could only resolve to a couple degrees at a time – basically 1/180th of the sky in a circle. A 180 sample map is not very high resolution. But he had a trick. The orbiting platform would be incredibly stable and predictable, so they could take a bunch of overlapping coarse passes with such accuracy that his software would be able to synthesize a much higher resolution of the sky – to fractions of a degree. This is how things like synthetic aperture radar work.

A little while later a mathematician came into the lab and wanted a still from the X Files credit sequence “The Truth is Out There”, for a presentation slide, but we couldn’t get a clean screen grab from the VHS tape. It was weird – the text and the background were stable, and at playback speed the image seemed clean enough – it’s only when we’d freeze a frame that the text melted into a fuzzy blob of static. So then I thought about it – basically every frame is like a gamma particle collection tube, the TV screen isn’t moving and neither are the text or the background. What if I were to “blend” a number of these frames together? Would a get a much clearer result? The answer was yes. And not just because of an NTSC fields vs frames thing – the more frames I combined the clearer the text became. What that blending method was I’ll leave up to the reader – and if you figure out let me know because I’ve forgotten. The point was that accumulating noisy images over time emphasized their similarities – the text, and deemphasized the differences – the smearing static. I’ve also used a similar technique to accumulate/synthesize a high resolution still from multiple low resolution renders each with slight camera shifts.

Okay so back to film grain and the resolution needed to represent a movie film frame with a digital frame. Here’s the fun part to mull over. A digital image is a regular grid of pixels in rows and columns. A film image is an irregular matrix of physical grains of stuff holding onto dyes of different colors – they aren’t uniform in size and certainly not uniform in location – and this all changes, randomly for every frame. Your brain does a great job of blending all of this together over time, just like blending together the video frames. How much digital resolution is needed to represent movie film? I remember when a 2K image was going to be more than enough. Now we’re pretty sure we need 4K. I wonder what kinds of phantom resolutions happen in our minds from the accumulation of unpredictable tiny grains of color? What kinds of resolution apertures we can synthesise out of all that noise? How do we experience different presentations of visual resolution?

here’s a link with some neat diagrams illustrating synthetic aperture radar:  http://www.radartutorial.eu/20.airborne/ab07.en.html

 

An Old Map and Declarative 3D

This is an early  “work in progress” visualization of an 18th century map drawn by Giambattista Nolli, using @A-Frame declarative HTML mark-up extensions for VR/3D in WebGL – with procedurally generated geometry and baked lighting in Houdini. Lots more to do and learn. Eventually it will be part of an AR promo piece but I couldn’t resist.

aaa_nolli_splash

(better navigation in cardboard, varied building heights, and better global illumination)

 

A Tale of Two Frame Rates

I had the good fortune of attending a local SIGGRAPH chapter talk by Bruna Berford of Penrose Studio, regarding production methodology and how they approached animation in their beautiful and emotionally compelling VR experience “Alumette”. She presented a good view into the difficulties, challenges,and rewards of adapting to working in this new medium. And let me say that to my thinking, they are actually embracing the new technology fully as a narrative medium.

But that’s not what this post is about. This post is about a misconception that people in V/M/AR have around the concept of frame rate. Specifically the holy grail of a 90+ fps redraw rate. This is held up as a metric which must be achieved for a non-nauseating viewing experience and is usually stated as an across the board dictum. Alumette, however, threw a very nice wrench into that, one which points to something that I’ve tried to articulate in the past. There are two different frame rates going on here. And that difference is apparent in Penrose’s use of stop motion frame rates for its animation.

The first “frame rate” is the one is the one that’s usually meant and I think of that as being the perceptual, or maybe even proprioceptural frame rate. This is the frame rate that corresponds to how well the environment is tracking your body’s movements. For instance when you turn your head, or when you walk around in a room scale experience. This is the one that tells your lizard brain whether or not you are being lied to. But a lot of people, including seasoned veterans, stop here, assuming that the matter is settled, but I think there’s a second frame rate at work.

The second is what I would call the exterior frame rate. This is the frame rate of the displayed content. And in Alumette this was definitely not at 90+ fps. In fact it was at a consciously much “slower” and less constant frame rate because it was being animated on hard, held keys with no interpolation. This was to emphasize the poses of the animation. The result was an elegant reference to traditional stop motion animation, with all of the artistic start/stop and a wonderfully surreal sense of time. And the overall experience in VR was not so much watching a stop motion animation, but rather existing in space with one. It was pretty cool.

The “content” was running at what I night guess averaged to ~12 fps, but the display of it, and therefore more importantly my perception of the experience was at the magic 90+ fps. This is an important distinction – especially when it comes to content creation. Would 360 video at a lower playback rate, say 18 fps, give us that old Super8 home movie feel as long as the the video sphere it’s projected onto was moving seamlessly? Could a game engine environment be optimized to hold frames of animation at 30fps allowing temporally redundant data to limit draw calls or GPU memory writes?

Spherical Stereo Camera for Immersive Rendering

At SIGGRAPH this year in LA, I was really impressed by a presentation that Mach Kobayashi (currently at Google) gave at the PIXAR Renderman User’s Group. It involved how to render 360 panorama stills for viewing in VR. He pointed out that the naive approach of placing two panorama cameras side by side would break as you looked away from the plane the cameras were in because the views would cross. The stereo version of a broken clock being right twice a day.

vr_sphCam_naive

He had a great solution involving ray-tracing the pictures from the tangent vectors of a cylinder, where the cylinder width is the inter-pupillary distance. And it works great:

http://www.machwerx.com/2015/08/21/rendering-for-vr/

But I was left wondering “what happens if you look up or down?” The answer must involve a sphere. So I gave myself a little project: ray-trace from a sphere to get the stereo result from every angle. And it almost works – like 98% works.

The basic idea is to extend a basic spherical panorama camera by adding an offset to the ray origin position without adjusting the viewing direction. The image is rendered, one pixel at a time by sweeping the “camera” horizontally in “u” and vertically in “v”. Like traveling the Earth by stepping a little to the east and snapping a one pixel picture of the sky all the way around, and then taking a step North, taking a single pixel picture and repeating. Luckily the computer is faster and the 3d scene database remains still for the shutter.

 

vr_sphCam_fig
Camera assembly scans around a sphere in u and v

I have some more work to do – like trying to integrate animating live “hero” elements in with the static stereo background to see if it still holds up to the eye, but this will do for now. Here’s a link to a test that’s made for vieing in Cardboard: http://scottmsinger.com/vrar/sphcam/  The icky pinching at the top of the sky is from an attractive but non spherical sky and cloud texture.

vr_sphCam_sidebyside
Side by side
vr_sphCam_landscape_D.sphericalRight.2
right eye from spherical projection stereo camera

Experimenting with particles, volumes, refractive and reflective surfaces is next. It’s promising, but tricky – the amount of filtering and pixel sampling is going to take some dialing in.

But there’s a problem. And it’s the same problem that map makers have when they try to flatten the globe into a single plane – you end up getting a mismatch of sampling distances and densities toward the poles. In my renders this results in a little “S” shaped warping as cameras are pointing too far up or down. The results are still pretty cool, and maybe good enough for many applications, but far from perfect. The other issue is that it means that just as many samples circle the sphere at the poles as at the equator, and whether you want to look at it as too much information in some places or not enough in the others – the fact is that it’s not a very efficient use of rendering resources.

So I’m working on another approach that uses geodesic spheres to derive sample points, and rotating the sphere points without rotating the normals. Oh there will be problems with that too I’m sure. But this is when I get to fall back on my MFA in painting at let all my higher math colleagues solve the problems I make for them.

vr_sphCam_geod_fig
?????

 

 

What contests are we winning?

Talking with people in the AR/VR world there’s a constant, silly question buzzing in the air like a gnat? “Who’s winning – VR or AR?”. It is an interesting question, not for what the question is asking but what asking it implies in the first place. Is this all a contest? With a winner and a loser? Have we become so obsessed with the “gamified marketplace of ideas” that we can’t actually be motivated without some implicit or explicit conflict or large plush prize? But what even is the conflict? What is there to lose in this contest of AR v VR?

The contest implies they are the same project, suggesting that their finish lines are the same finish line. They are drawing upon a lot of the same technology for sure, but so are mobile phones, connected thermostats, smart TVs and watches. The basic antagonism seems to revolve around posturing –  for the best head mounted display, or the most pure vision of what is meant by “immersive” or “reality”, or who is the reigning champ of the “ultimate experience”. And this would all be as ludicrous a sideshow as it sounds, except for the number and stature of people involved on both “sides” who act like it’s a serious debate. In fact it was an actual debate at this year’s Augmented World Expo, and only a marginally tongue in cheek one.

And I get how important it is for one hardware maker to be able to capture market share, get funding or get acquired. Or for a game publisher to drum up marketing collateral to prep for a release. What’s a little bothersome is how easily this marketing spun copy is eaten up by people who should know better and then regurgitated as a real, pressing issue, when the real pressing issue is that people need to move past the towel snapping and make more complete things that are actually worth doing.

Manufacturers of HMDs want to demonstrate that each has the better display resolution, the better optics, better hardware integration – this makes perfect sense. It’s like competing computer chip makers claiming theirs is best because of clock speed, number of cores or instruction sets – it’s reasonable.   The differences in comfort, tradeoffs between configurability and convenience, and comparable aesthetics are like the PC v Mac debate – okay, I get that. Arguing whether VR or AR  will “win”, or is “better”, is like someone arguing that a realtime, embedded OS is inherently better than an interactive one like Windows, or that a freight train is better than a cargo ship.

If we take the crassly entrepreneurial measure of money – then AR has already “won”. It has market share, it’s profitable in products now, it generates revenue. But really, it’s a silly debate – we’ve been augmenting and virtualizing reality for years : the transistor radio, books, air freshener, hell even the rearview mirror. Timothy Leary is laughing at us all right now because he “won” the contest 50 years ago – and without a computer. So what should you do when someone asks you “who will win AR or VR?”– I think I know what Dr Leary would do.

Is there a VR equivalent of the whip pan?

There are many ways that a virtual reality cinematic experience differs from a traditional one. Much of the discussion is around whether or not certain techniques translate from one world to the other, and what can be done psychologically, physiologically, or mechanically without inducing pain or nausea in the viewer. But there’s less discussion about the narrative structuring elements of traditional filming and editing techniques and what their immersive counterparts might be. There are numerous camera and editorial decisions that we have “learned” to understand in watching movies – the so called cinematic language. Things like the over-the-shoulder point of view shot, the two angle dialog shot, or the sequence that moves from a long establishing shot to a medium shot to a closeup shot to introduce a place and/or character. But I’m going to choose the whip pan as the best example to illustrate some basic differences between traditional and VR cinema.

But why the whip pan? Well because it’s a very useful and ubiquitous film transition, that functions on many levels, yet it is almost entirely physically useless in an VR experience because that much camera motion is disorienting and causes motion sickness.

So is there a VR analog of the whip-pan? And what is a whip pan? Simply put – it’s where the camera spins very quickly from one point of view to reveal another one. This winds up putting the ending view in juxtaposition to the starting view. A common example might be a tense POV shot where the camera whips around to reveal the threat – a monster, the enemy, or antagonist of some kind. Also, because one byproduct of moving a camera very fast is an exaggerated motion blur that obscures what’s being filmed, a whip pan can be used to hide a transition to a radically different camera angle or even a different setting or time. The whip pan wrenches our perception and attention from one understanding of the story to another.

To illustrate the difficulty in finding a VR analog, let’s look at a whip pan example applied to both traditional projected cinema and immersive cinema.

Traditional: We are watching a horror movie, the setting is a forest at night, the shot is a POV shot and we are walking forward slowly, scared, seeing only what’s lit directly by our flash light. We hear a noise from behind – a breaking twig – the camera whips around to exactly where the monster is, and then drops down to just miss an enormous claw aimed right at where our heads would have been.

Immersive: We are experiencing the horror movie in a first person role, moving through the forrest at a rate that’s not too disorienting to be uncomfortable or distracting, maybe our attention is following the flashlight beam. There’s a sudden noise behind us, a breaking twig, we turn in our seats to, maybe see the monster in time to maybe duck underneath its giant claw. Maybe our reaction times are too slow and we miss the action of the monster entirely but are thrown into the next segment of the narrative anyway. Maybe we’re left confused and lose interest.

So what is it that makes the traditional whip pan so effective, what did it facilitate, and what does it lend to the overall emotional effect of the shot? The whip pan is the reaction of the audience; we have been conditioned as an audience to interpret this motion as our own action, something which on some level we feel we have created and are in control of. Our experience is that we are reacting to the sound by actively turning toward it and facing it, when in fact that’s very far from the truth. In reality the “camera” presciently reacts to a sound created at precisely the right moment, at exactly the right speed to get to the perfect point at the next perfect moment to both reveal the horrifying threat and simultaneously and miraculously escape it. This is not an exercise of free will, this a controlling cinematic device with a very specific narrative purpose and outcome: disorient, respond, reveal, escape.

There are also practical components of a whip pan that are often just as important. People in VFX know that you can hide a multitude of sins in the motion blur. Even in non-effects films, editors can hide a number of unlikely or incongruous transitions by dissolving through a fast camera move. Very often the camera angle or placement from the A part of the pan is not desirable for the B part. Or maybe you want to pan from being tall and looking down to being short and looking up. This can be very effective at making a revealed threat seem more ominous, or it might simply be an accommodation for practical size differences. The point is that the confusion and the disorientation of the pan itself is used to hide incongruities and inconsistencies that occur in getting from your origin to your destination. Because we know to accept this motion blur as transition from one thing to another it can also be used to transition from one place or time to another – we begin the pan and everything is green, we end it and everything is covered in snow – we know that we went from summer to winter and probably understand that we had a whirlwind autumn.

The whip pan goes beyond a simple camera move and becomes a complex mechanism of narrative cinema. How do we give a VR audience experience the same emotional experience? Should we? It may be that the whip pan is too idiomatic a part of traditional film to have an easy, direct counterpart – like one of those German words that has to be translated into an entire sentence in English.  Maybe it’s a matter of breaking entirely from a first person POV experience to an angle where we see the entire resolution of the action?  Maybe trying to do that at all is missing the point entirely. Maybe VR is not about controlling the view, but rather controlling the environment. Maybe narrative control is an outdated authoritarian construct. Maybe it all just requires our experience of VR to mature, and – like those people who stopped running out of early movies from oncoming trains – we’ll stop throwing up and learn how to read to VR’s abrupt new camera moves.