IDCT

Watching this program execute (c / ctrl-c) at adjustable speed (ctrl-t / alt-t) it becomes clearer that the process of printing an image is basically a pipeline that goes: IDCT → Y′CbCr to RGB → decimate → sharpen → block render. Thanks to Blinkenlights we can also see that the RGB conversion is going slower than it should, because code isn't benefiting from SSE register vectorization. Many other common issues concerning micro-optimization, such as register spillage, become super apparent as well.