Side note for CS junkies: dig how dramatically better Huff is able to compress this clip. I'll let you Google for the underlying mechanics of Huffman trees yourself, but suffice to say the algorithm likes you to use as few "letters" of the "alphabet" as possible. Typical text only uses about a quarter of the 8-bit ASCII table, so it compresses very well indeed (much better than 25%, even, because some letters occur far more frequently than others). What does that have to do with video? Well, it's a simple isomorphism -- Y, U, and V values are letters, and the 8-bit gamut is the alphabet. Actually, the computer doesn't even care about the conceptual difference since
BenRG's implementation uses a static, predetermined tree (as any realtime codec must). I'll bet if you used this clip to build a new tree, the compression would be even better.
Why? Oh yes, because of my original point: when your video is clipping like this, chances are also failing to saturate the white point. As it turns out, if you look at it with a histogram tool, you'll see that a large chunk of the "alphabet" is going unused. So it goes.