Tuesday 20 August 2013

Tracking video part i

This is a complicated, complicated topic which involves a lot of deep maths.  I don't understand any of the underlying maths behind it but thankfully the magnificent software 3D Equalizer (3DE) does a lot of the heavy lifting.  I have quoted from Wikipedia and other sources where need be (i.e. where I don't understand) but I will try to put things into my own, simpler terms, for my own benefit as much as the reader's.  I'm not being patronising when I say that, as I could include sentences such as:

"Because the value of XYi has been determined for all frames that the feature is tracked through by the tracking program, we can solve the reverse projection function between any two frames as long as P'(camerai, XYi) ∩ P'(cameraj, XYj) is a small set. Set of possible camera vectors that solve the equation at i and j (denoted Cij).
Cij = {(camerai,cameraj):P'(camerai, XYi) ∩ P'(cameraj, XYj) ≠ {})" Wiki

The 3D Equalizer interface.  Source: http://www.kopona.net/soft/multimedia/28725-3d-equalizer-v4r1b9.html

match mov·ing  
1. In cinematography, match moving is a visual-effects technique that allows the insertion of computer graphics into live-action footage.

3DE is a piece of match moving software.  Match moving is a general term which encompasses a few different disciplines, all of which have the same end goal: simply, the matching of the movement of a camera in a piece of video so CGI can be added.

The problem is thus:  I have a piece of CGI I would like to add to a piece of video footage.  I render out my piece of CGI and put it into place at the start of my video (as per the typewriter).  I play the video, the CGI does not move with the video.  It doesn't move because the render is not animated - the camera in Maya through which I rendered did not move, so the CGI remains still.

The get over this problem I need to match the movements of the camera in 3D space.  All of the information about the camera's movement can be calculated from 2D video file.  This process uses much maths, and is basically triangulation.  It is the same fundamental maths as that behind GPS.

tri·an·gu·la·tion  
1. In trigonometry and geometry, triangulation is the process of determining the location of a point by measuring angles to it from known points at either end of a fixed baseline.

The first step is to track features in the video (a tracking point).  These are normally points of contrast or distinctive shapes.  The most important, fundamental, absolutely undeniably vital point to take away from this is that the features being tracked must be stationary within the scene.  The purpose of tracking these features is to allow the software to calculate the position of the camera relative to the scene.  If the features being tracked are moving objects such as leaves blowing in the wind, people, vehicles, etc. then the software will be basing its calculation on incorrect data and will return an incorrect result.  This is similar to trying to work out the location of a sound without realising you're listening to the echo.  

So the video is played through 3DE and distinctive features are tracked.  To track a feature one zooms in on it, marks it as a feature to be tracked, and plays through the video.  The software moves from frame to frame tracking the feature and the user adjusts the definition of the feature's size and shape to ensure the tracking point stays in the same position.  It must not waver by a pixel.  Without a human there to tell the software what should be tracked it will not work.  There are automatic solutions but we are discussing the manual process so they can be put to one side.  The human, in this case, is vital.  Adjusting the contrastsaturation and brightness of the image may help the software track the feature.

It is important to build up a good spread of tracking points.  The software cannot make triangulation calculations based upon only one point.  It may be able to with only ten points, but upwards of 20 are normally required.  Theoretically there is no upper limit to the number of tracking points.  There must be tracking points close to the camera, far away, and in the middle distance.  This selection is required to make it easier for the software to see parallax in the video.  

par·al·lax
1. An apparent change in the direction of an object, caused by a change in observational position that provides a new line of sight.

Parallax is what we understand as perspective.  If you are travelling in a car, objects close to you whiz by while the horizon moves very slowly.  This magnificent GIF explains it perfectly.

Source: http://en.wikipedia.org/wiki/File:Parallax.gif
We all implicitly understand this.  The software understands this: things close move quickly, things further away move slowly.  By tracking points in the video (for example, the corner of each cube in the GIF) the software is able to establish what relative distance objects are from the camera.  From that, the software is able to calculate the position of the camera.

I will end this update here.  There is more to say about video tracking but I will split it up.  I will talk a little more about the software side of it, and then explain some of the problems we have in transferring the data generated by the tracking software into Maya.

The main points to take away are:

- To put CGI into a video we need to track the camera's movements.
- We do that by match moving - tracking specific points within the footage.
- The software takes that and triangulates the position of the points in 3D space.

No comments:

Post a Comment