A simple prediction-based video codec
I plan to clean up this code some more and someone said I should pick a
fourcc for it. Call the version below 0.1 and I'll promise to fix this
up real soon now.
/**
* This source file implements a simple form of information-preserving
* differential coding, that is, predictive coding without quantization.
* The basic idea is to predict the value of a pixel and encode the
* error.
*
* I first got interested in the idea from the 'Huffyuv' codec by Ben
* Rudiak-Gould and released under the GPL. See
* http://www.math.berkeley.edu/~benrg/huffyuv.html for more information
* about 'Huffyuv'.
*
* There are a few ways of making a good prediction. My reference is
* 'Image and Video Compression for Multimedia Engineering' by Yun Q.
* Shi and Huifang Sun. One good reference they seemed to use was:
* Haskell, B. G., Frame replenishment coding of television, in 'Image
* Transmission Techniques', W.K. Pratt (Ed.), Academic Press, New York,
* 1979.
*
* Here are some ideas: Say we have the following samples:
*
* a b c m n o
* d e f p q r
* g h i s t u
* j k l v w x
* | +----current frame
* previous frame
*
* So, say we're trying to encode sample t. There are a few easy
* prediction schemes. The few I played with were:
*
* Element difference: E = t - s
* Field-based interpolation: E = t - ((n + s) / 2)
* or E = t - ((n + s + m) / 3)
* Median (field based): E = med(n, s, (n + s) / 2)
* or E = med(n, s, n + s - m)
* Frame difference: E = (t - s) - (h - g)
* (E = error)
*
* Some other neat suggested ones were:
* Field difference:
* E = (t - ((q + w) / 2)) - (h - ((e + k) / 2))
* Line difference:
* E = (t - n) - (h - b)
*
* Some of the above names are from Shi and Sun. Also, some were based
* on Ben Rudiak-Gould's page. Basically though, what I found was that
* the frame difference idea of using the previous frame did not produce
* a code with much less entropy (<.1 bit). As well, using anything
* from 2 scanlines above is another bad idea. This leaves only the
* element difference, that is, using the previous pixel value, as the
* best option.
*
* I seem to get an average compression of about 55%-65% of the original
* uncompressed size.
*
* The code below is optimized for codewords < 16 bits. I first
* generated a code with a max wordlength of 12 bits based on
* 'Transformers: The Movie' VHS sampled at 352x480. I generated the
* new code below from 'Austin Powers' VHS, and I think I like it
* better. Sampling TV at 352x480 gave me longer codes without much
* better entropy values, so I'm keeping this code for now.
*
* I use the source code found at
* http://www.cs.mu.oz.au/~alistair/abstracts/wads95.html
* to generate the word lengths for my code. The code I build at
* runtime based on these lenghts. I generate a 'canonical huffman
* code'. I got help for coding this (and optimizing it) from the
* website 'Practical Huffman Coding' at
* http://www.compressconsult.com/huffman/
*
* You'll note that I use the same compression table for both the luma
* and the chroma. I know, I could probably do even better by building
* different codes for each, but I didn't. Since I'm only sampling at
* 4:2:0 rates in my app, I'm ok with this situation, for now.
*
* - Billy Biggs <vektor@dumbterm.net>
*/