Some words on accuracy

76250_micrometer_lg

I consult and advise on a variety of topics from visual effects and animation, to A/M/VR, to IoT and cloud processing, and one thing comes up again and again – “The solution needs to be as accurate as it can be.” Whenever I come across this, I’m reminded of something that happened a long time ago, in my pre-digital life, how accurate is “as accurate as it can be”?

In 1982 after I had graduated high school, I went to guitar repair school in Springhill, TN – then a sleepy, rural town outside of Nashville. I learned a few things there: how to get into Nashville bars at 17, rosewood poisoning makes you really sick, and that accuracy is a function of, well … function.

The instructor gave the students little custom made rulers with all kinds of nifty guitar repair measurements on them for fret spacing, fingerboard radii, etc. They were custom made by a machinist friend of his, and he had a little story that went with them. When he asked his friend how much the rulers would cost, his friend asked him “How accurate do they need to be?”. He said “As accurate as you can make them?”. His friend replied “Then each one would cost $10,000.”, to which my instructor then asked: “How accurate is $20 worth of accurate?” His friend said “Probably accurate enough.”

That was 1982, and I’m not sure if $20 worth of accuracy is as accurate anymore. But the lesson taught us was that when repairing something, one must first ask about the needs of the repair. Is it a crappy guitar where replacing the frets would cost more than the price of an entire, better instrument? Or is it a priceless old Martin that needs to have the bridge reattached with meticulous care? Each requires the best work you can do, but the best work you can do for each, will be different.

IoT and AR rely heavily on GPS to get location data. GPS is (to most people) surprisingly inaccurate – only to within 10 meters, and it can also be very noisy. But in many cases it’s accurate enough – especially after filtering. We can, for instance, track the fleet of trucks through the city, or hold up our phones and see that the bar with the best happy hour is “over there” through the heads up display beer finder app. Now, if we need that app to replace the bar’s sign with a 3D list of drink specials which tracks flawlessly to it, we would require a different, more accurate location tracking technique.

It’s more than just a matter of choosing the right tools for the job however, it’s also scaling the approach of solutions to match the needs of the tasks at hand. “How accurate do they need to be?” $20 worth of accurate? 30 feet worth of accurate? Animated GIF accurate? Substitute your reality with another accurate? Escape the “uncanny valley” accurate? Or maybe 1 second of accurate? 3 minutes of accurate? The Nyquist theorem tells us that to accurately digitize an analog sound, we need to sample it at over twice the frequency of the highest frequency of the signal we want to represent. Before launching into a solution try asking yourself: “What is the metaphorical ‘nyquist frequency’ of the problem you’re trying to solve?”, and build the solution accordingly. Of course the interesting question becomes: “What is the nyquist frequency of repairing a Martin guitar?”

Advertisements

Film grain, persistence of vision, sensor tubes, and experienced resolution.

At various times in my career I’ve had to think about image resolution, film grain, how to pull a clean image from a noisy video, and orbiting gamma-ray observatory sensor tubes. The question is: how much digital resolution is needed to represent the analog resolution of motion picture film? Or maybe it’s more of a statement: How to problematize the question of – “how much digital resolution is needed to represent the analog resolution of motion picture film?”, or at least have fun thinking about it.

When I worked at the visualization lab at UCR, in the dark ages, I remember a physicist talking about a project he was working on for an orbiting gamma-ray observatory that was going to map background radiation from outer space. They wanted to do this at a very high resolution. The problem was that the gamma particle sensor tubes they used could only resolve to a couple degrees at a time – basically 1/180th of the sky in a circle. A 180 sample map is not very high resolution. But he had a trick. The orbiting platform would be incredibly stable and predictable, so they could take a bunch of overlapping coarse passes with such accuracy that his software would be able to synthesize a much higher resolution of the sky – to fractions of a degree. This is how things like synthetic aperture radar work.

A little while later a mathematician came into the lab and wanted a still from the X Files credit sequence “The Truth is Out There”, for a presentation slide, but we couldn’t get a clean screen grab from the VHS tape. It was weird – the text and the background were stable, and at playback speed the image seemed clean enough – it’s only when we’d freeze a frame that the text melted into a fuzzy blob of static. So then I thought about it – basically every frame is like a gamma particle collection tube, the TV screen isn’t moving and neither are the text or the background. What if I were to “blend” a number of these frames together? Would a get a much clearer result? The answer was yes. And not just because of an NTSC fields vs frames thing – the more frames I combined the clearer the text became. What that blending method was I’ll leave up to the reader – and if you figure out let me know because I’ve forgotten. The point was that accumulating noisy images over time emphasized their similarities – the text, and deemphasized the differences – the smearing static. I’ve also used a similar technique to accumulate/synthesize a high resolution still from multiple low resolution renders each with slight camera shifts.

Okay so back to film grain and the resolution needed to represent a movie film frame with a digital frame. Here’s the fun part to mull over. A digital image is a regular grid of pixels in rows and columns. A film image is an irregular matrix of physical grains of stuff holding onto dyes of different colors – they aren’t uniform in size and certainly not uniform in location – and this all changes, randomly for every frame. Your brain does a great job of blending all of this together over time, just like blending together the video frames. How much digital resolution is needed to represent movie film? I remember when a 2K image was going to be more than enough. Now we’re pretty sure we need 4K. I wonder what kinds of phantom resolutions happen in our minds from the accumulation of unpredictable tiny grains of color? What kinds of resolution apertures we can synthesise out of all that noise? How do we experience different presentations of visual resolution?

here’s a link with some neat diagrams illustrating synthetic aperture radar:  http://www.radartutorial.eu/20.airborne/ab07.en.html

 

An Old Map and Declarative 3D

This is an early  “work in progress” visualization of an 18th century map drawn by Giambattista Nolli, using @A-Frame declarative HTML mark-up extensions for VR/3D in WebGL – with procedurally generated geometry and baked lighting in Houdini. Lots more to do and learn. Eventually it will be part of an AR promo piece but I couldn’t resist.

aaa_nolli_splash

(better navigation in cardboard, varied building heights, and better global illumination)

 

A Tale of Two Frame Rates

I had the good fortune of attending a local SIGGRAPH chapter talk by Bruna Berford of Penrose Studio, regarding production methodology and how they approached animation in their beautiful and emotionally compelling VR experience “Alumette”. She presented a good view into the difficulties, challenges,and rewards of adapting to working in this new medium. And let me say that to my thinking, they are actually embracing the new technology fully as a narrative medium.

But that’s not what this post is about. This post is about a misconception that people in V/M/AR have around the concept of frame rate. Specifically the holy grail of a 90+ fps redraw rate. This is held up as a metric which must be achieved for a non-nauseating viewing experience and is usually stated as an across the board dictum. Alumette, however, threw a very nice wrench into that, one which points to something that I’ve tried to articulate in the past. There are two different frame rates going on here. And that difference is apparent in Penrose’s use of stop motion frame rates for its animation.

The first “frame rate” is the one is the one that’s usually meant and I think of that as being the perceptual, or maybe even proprioceptural frame rate. This is the frame rate that corresponds to how well the environment is tracking your body’s movements. For instance when you turn your head, or when you walk around in a room scale experience. This is the one that tells your lizard brain whether or not you are being lied to. But a lot of people, including seasoned veterans, stop here, assuming that the matter is settled, but I think there’s a second frame rate at work.

The second is what I would call the exterior frame rate. This is the frame rate of the displayed content. And in Alumette this was definitely not at 90+ fps. In fact it was at a consciously much “slower” and less constant frame rate because it was being animated on hard, held keys with no interpolation. This was to emphasize the poses of the animation. The result was an elegant reference to traditional stop motion animation, with all of the artistic start/stop and a wonderfully surreal sense of time. And the overall experience in VR was not so much watching a stop motion animation, but rather existing in space with one. It was pretty cool.

The “content” was running at what I night guess averaged to ~12 fps, but the display of it, and therefore more importantly my perception of the experience was at the magic 90+ fps. This is an important distinction – especially when it comes to content creation. Would 360 video at a lower playback rate, say 18 fps, give us that old Super8 home movie feel as long as the the video sphere it’s projected onto was moving seamlessly? Could a game engine environment be optimized to hold frames of animation at 30fps allowing temporally redundant data to limit draw calls or GPU memory writes?

Who will make the merchandise display cases for VR shopping?

Many are convinced that simultaneous, shared, social experiences in VR and other 3D immersive modalities are a foregone conclusion. Regardless of how deluded we might be in this, one thing becomes clear – in order for this to scale, we will need to have a consistent way of describing all of the stuff – much like how molecules are a consistent way of describing the real world. Luckily the virtual world is many orders of magnitude simpler than the actual physical world, and instead of the uncountable trillions of sub particle level interactions of matter, the virtual world needs only a truly astounding level of trackable events through a potentially manageable number of protocols and standards.

The problem is that even at many orders of magnitude simpler, the task of how to consistently describe “anything” so that we can share it, sell it, buy it, travel to it, hold it, toss it back and forth, etc. is still really, amazingly complicated. Much more complicated than say – the choice of game engine d’jour, OBJ or FBX or Collada, or whether or not you have a cool physics engine. But what really are the basics of virtual matter that need description so that they can be manipulated in the ways we expect? I was thinking about this and came up with a functional, if prosaic example to get me into a more pragmatic frame of mind than say – blasting space zombie outlaws.

Let’s assume we have simultaneous social immersive 3D experiences delivered over a common framework. And let’s say that within that space there are millions of stores. And in many of these stores is the virtual equivalent of a merchandise display case. And let’s say your company makes display cases for virtual environments. There are a lot of assumptions here for sure, and the “display case” here is really just a conceptual placeholder for whatever the virtual world might offer up as a kind of “durable good”. But let’s put all that aside for the moment and assume that your business is making virtual display cabinets.

In the real world, display cabinets have certain features that make them more suited for some purposes than others, and yours are very good and specialized. In your case they are jewelry cases that have buttons on top that let a shopper rotate the shelves around forward and back. You make high end cabinets that are very durable and come in standard sizes that fit in with other leading retail fixture manufacturers’ products. The doors operate smoothly allowing ample access for the sales associate to quickly retrieve even the most tiny items the customer might want. When a business orders cabinets from you, they pay for them, you ship them out, they are installed and exist physically in place. No one can really duplicate them beyond manufacturing a knock off product.

In a virtual world retail businesses will want display cabinets, and just like in the real world they won’t normally want to design and manufacture them themselves. They will expect to buy them and for them to just simply work. Customers will be able to easily peruse their options and make their choices. They may want to try things on, see how they match the color of their eyes before they buy them. Your display cases will have to use the same “trying on” mechanisms that the rest of the display cases in the store do, because the store will want to support the latest most accurate shopping reality capture avatar system available. Your display case needs to be installable within the store’s inventory control scheme, but also installable within the stores local cartesian coordinate frame. It needs to be addressable within their asset management system so that stock changes and merchandising decisions can be pushed to the cabinets from central databases. Your cabinets will need to be backward compatible with this stores stock and inventory system which is several versions out of date because “they like their system fine the way it is”, and they are a big customer so you need to keep their business.

And so let’s say now you’ve managed to make a a future proof, universally accessible and addressable, fully inter-functional display cabinet, backward compatible with old virtual mercantile standards, with compliant e-commerce security features, but you still have another issue. How do you make sure that the store isn’t making copies of your display cabinets and using them across all their wholly owned subsidiaries? Or selling them overseas to offset a flat Christmas sales season? Or being stolen by a nefarious shopper and resold on the lucrative display cabinet black market?

This is where it’s all about standards. All about the protocols that set out the expected behaviors and configurations that define and prescribe how all of these magical virtual interactions happen. It’s the subatomic glue that connects all the disparate experiences into coherent, navigable places, and continues to do so after the cowboys and star fighters have all gone home.

Inside out projection

I recently took place in a Dance/Hack at Kinetech Arts, here in San Francisco. Kinetech is a group of technologists and dancers around whom a whole host of people (me included) orbit and take part when we can. The Dance Hack is a yearly event where teams of dancers/technologists/musicians/visual artists, get together to create some expression of motion technology. This year it was also linked to similar events in London and Amsterdam. In addition to special programs and performances, Kinetech has an open studio every Tuesday.

One Tuesday I brought my Kodak PixPro 360 camera, which takes a dome image 360 degrees around and a little over 210 degrees horizon to horizon. It records the anamorphic projection onto a square. Dancer/Choreographer, Megan Meyer, was very interested in what it did. She was interested in how the spherical nature of the image related to depictions found on ancient pottery. So we decided to try and put something together for the Dance/Hack.

h2_31-11-10

Greek pottery

The concept started out grand, of course – epic narrative, historical reference, comments on contemporary society, etc. The time constraints and the resource limitations meant scaling back the vision a bit.

I’ve been thinking about alternate projections since I was a kid and learned that Greenland wasn’t nearly as big as I’d been led to believe, that straws don’t actually bend in glasses of water, and that perspective is sort of skewing a cone into a cylinder. I am intrigued by the visual and narrative possibilities of full-dome projection, but the barrier to entry is very high – too high for the DIY spirit of the Dance/Hack. We needed a simpler, cheaper way of reconstructing the images onto some viewing surface that many people could view.

We started to think small. The idea started from Megan’s concepts of ancient pottery which, though now are priceless museum antiquities, were created to be utilitarian, domestic items. Could we make the experience happen on a domestic scale?

Kodak sp360 web site

The Kodak 360 images remind me of chrome-ball photos taken on set for visual effects filming. The angles of the projection are different, but visually the results are very similar. What if I invert that process? If I could project the image off of a reflective sphere, and onto a larger, translucent ball, the resulting image would rectify itself into a relatively undistorted final result. It worked in my head while we were drinking coffee and drawing on scratch paper – so what could possibly go wrong? (within tolerances)

Here’s the theory:

The Kodak sp360 takes images through a very wide fisheye lens – with an over 200 degree field of view. The idea was to invert the distortion of flattening the dome onto the picture plane by projecting the picture plane back off of a dome21022835408_688a34069d_z_d

grec_urn_figE

Of course this worked great in my head and on paper. And in a perfect world with enough time and resources, it probably would have worked well in practice. But part of the fun of a hackathon is fielding the unexpected and working within the real world constraints of the day.

grec_urn_figA

 

Here’s the reality:

To begin with we didn’t have a budget to buy a very round and reflective chrome ball like the kind used in VFX production, but Megan did find some very shiny silver christmas ornaments. We didn’t have a source for spheres made of a highly transmissive translucent material, but Ikea has a good deal on round paper lanterns.  Well – okay – we’re building a lamp, and the aesthetic direction is based around a wood and paper lantern. That’s great because the rig to hold the projector, the ornament, and the lantern was going to be built from wooden slats and dowels.

The key behind getting this all to work was establishing the correct alignment of projector beam to christmas ball, and then to the paper lantern. This seemed easy because it should just be a matter of stacking one directly on top of the other and then changing the distances between them to adjust for which section of the image hit where on the sphere. But I hadn’t counted on the built in convenience functionality of the little projector. Unlike the projectors I remember from the previous millennium, modern projectors have all kinds of corrections so that you don’t have to angle the projector or worry about “keystoning” the image.

grec_urn_figD

This meant that all of that proper alignment was in practice shifted wildly off of the “axis” of projection. And, in fact, the image hitting the sphere wouldn’t even be a square. Most of the day was spent with a swiss army knife saw, a cordless drill, and masking tape, trying to shim and bend everything to hold the ornament in the right place. Luckily dowel rods are flexible.

Here’s the result:

 

grec_urn_figC

It was a little clunky but had its own charm – like a cross between some wabi sabi rusticness and the tree from A Charlie Brown Christmas. But it worked, and definitely points to some interesting possibilities with bigger and better hardware.

What I find most intriguing about this projection is that inverts the notion of looking out and looking in. The camera “sees” out, but the projection allows us to look into that view. I really like this look into an impossible slice of perspective – a view into the infinite.

OLYMPUS DIGITAL CAMERA

 

Spherical Stereo Camera for Immersive Rendering

At SIGGRAPH this year in LA, I was really impressed by a presentation that Mach Kobayashi (currently at Google) gave at the PIXAR Renderman User’s Group. It involved how to render 360 panorama stills for viewing in VR. He pointed out that the naive approach of placing two panorama cameras side by side would break as you looked away from the plane the cameras were in because the views would cross. The stereo version of a broken clock being right twice a day.

vr_sphCam_naive

He had a great solution involving ray-tracing the pictures from the tangent vectors of a cylinder, where the cylinder width is the inter-pupillary distance. And it works great:

http://www.machwerx.com/2015/08/21/rendering-for-vr/

But I was left wondering “what happens if you look up or down?” The answer must involve a sphere. So I gave myself a little project: ray-trace from a sphere to get the stereo result from every angle. And it almost works – like 98% works.

The basic idea is to extend a basic spherical panorama camera by adding an offset to the ray origin position without adjusting the viewing direction. The image is rendered, one pixel at a time by sweeping the “camera” horizontally in “u” and vertically in “v”. Like traveling the Earth by stepping a little to the east and snapping a one pixel picture of the sky all the way around, and then taking a step North, taking a single pixel picture and repeating. Luckily the computer is faster and the 3d scene database remains still for the shutter.

 

vr_sphCam_fig
Camera assembly scans around a sphere in u and v

I have some more work to do – like trying to integrate animating live “hero” elements in with the static stereo background to see if it still holds up to the eye, but this will do for now. Here’s a link to a test that’s made for vieing in Cardboard: http://scottmsinger.com/vrar/sphcam/  The icky pinching at the top of the sky is from an attractive but non spherical sky and cloud texture.

vr_sphCam_sidebyside
Side by side
vr_sphCam_landscape_D.sphericalRight.2
right eye from spherical projection stereo camera

Experimenting with particles, volumes, refractive and reflective surfaces is next. It’s promising, but tricky – the amount of filtering and pixel sampling is going to take some dialing in.

But there’s a problem. And it’s the same problem that map makers have when they try to flatten the globe into a single plane – you end up getting a mismatch of sampling distances and densities toward the poles. In my renders this results in a little “S” shaped warping as cameras are pointing too far up or down. The results are still pretty cool, and maybe good enough for many applications, but far from perfect. The other issue is that it means that just as many samples circle the sphere at the poles as at the equator, and whether you want to look at it as too much information in some places or not enough in the others – the fact is that it’s not a very efficient use of rendering resources.

So I’m working on another approach that uses geodesic spheres to derive sample points, and rotating the sphere points without rotating the normals. Oh there will be problems with that too I’m sure. But this is when I get to fall back on my MFA in painting at let all my higher math colleagues solve the problems I make for them.

vr_sphCam_geod_fig
?????