Note: This is a more technical post than usual, and about 5 months late.
The decoding in the OBE C-100 decoder was optimised to make use of instructions in modern CPUs and this blog post explains how we did it:
HD-SDI video uses 10-bit pixels but computers operate in bytes (8-bits). However, 10-bit professional video doesn’t fit nicely into bytes. Instead, 10-bit video on a computer is stored in memory like this:
The X represents an unused bit - note how in total 12 out of 32 of the bits are unused (that’s 37.5%). It’s very wasteful if the data needs to be transferred to a piece of hardware like a Blackmagic SDI card. Virtually all professional SDI cards use the ‘v210’ format that was first introduced by Apple in the 90s  and v210 improves the efficiency of 10-bit storage by packing the 10-bit video samples as follows:
(adapted from )
Now only 2 out of the 32-bits are unused, a major improvement. Using the old v210 encoder in FFmpeg, each pixel is loaded from memory, shifted to the correct position and “inserted” using the OR operation. When doing this on 1920x1080 material, this involves about 250 million of these operations every second. More CPU time is spent packing the pixels for display than actually decompressing them from the encoded video!
Clearly, we’ve got to do something about this - Thanks to the magic of SIMD instructions (in this case SSSE3 and AVX) we can instead process 12 pixels in one go :
This can be (unscientifically) benchmarked with the command:
ffmpeg -pix_fmt yuv422p10 -s 1920x1080 -f rawvideo -i /dev/zero -f rawvideo -vcodec v210 -y /dev/null
A 3x speed boost.
But, a lot of content that the decoder receives is 8-bit which has this packing format:
In existing software decoders, this needs to be converted to the 10-bit samples in the first picture and then packed into v210, a two step process. But, we can now just do this in a single step.
ffmpeg -pix_fmt yuv422p -s 1920x1080 -f rawvideo -i /dev/zero -f rawvideo -vcodec v210 -y /dev/null
What more could be done:
Thanks must go to those who helped review this code.
(This is from Apple’s venerable Letters from the Ice Floe)