From perspective of JPEG's DCT function a line of pixels represents a sampled waveform, and therefore a block with sharp black/white edges is going to be similar to a square wave composed of only minimum and maximum sample values:
Inevitably, DCT and JPEG quantization are going distort the waveform. The distortion usually has a wavy shape that increases some values, and decreases others. This is the source of "ringing" artifacts.
Because values of the input waveform were already at minimum/maximum, the distortions will make parts of the wave out of range: overshoot (marked in cyan). Other parts will have values lowered below maximum or raised above minimum: undershoot (red).
The JPEG decoder expects that some values will overshoot and clips the waveform to fit in the allowed range (0-255 in RGB output). After clipping the overshooting values are flattened (<0 replaced with 0, >255 replaced with 255), so only the undershooting half of the distortions remains visible:
Overshoot can't make white areas any brighter, but undershoot makes white areas darker. This is the reason why black-on-white text in JPEG looks "dirty" with gray square halos around it.
Here's the trick
The waveform is altered before quantization: minimum and minimum values are extended to overshoot (everything else remains unchanged):
The modified waveform will have a higher amplitude, and after JPEG compression the distortions will be more likely to remain in the overshooting range:
Clipping done by the JPEG decoder will flatten all overshooting areas, and therefore hide all of the distortions:
The image will appear to have sharp edges without any ringing artifacts and "dirty" background!
Second part of the trick
DCT breaks down the waveform into a sum of frequencies, and square waves are an edge case that's hardest to represent this way.
We can pretend that the waveform we're encoding has been clipped, and perfrom de-clipping. De-clipping only extends the waveform in the range that is clipped by the JPEG decoder, so this modification won't alter visible pixels of the image.
Instead of just blindly increasing minimum/maximum values as before, we're going to extrapolate them with splines to make them "rounder" and therefore reduce sharpness of hard-to-encode corners of the square wave:
A secondary benefit of this is that spline extrapolation won't create unwanted sharp edges around smooth borders between white and light gray. The amount of overshoot added for deringing will be roughly proportional to sharpness of edges it affects.
Random facts
- In my implementation I've used Catmull-Rom splines and an arbitrary amount of extrapolation, but only because it was good enough and easy to implement. It'd be nice to find an optimal de-clipping method, but empirically even relatively simple hacks give decent results.
- This trick only works for black and white edges, and can't hide ringing around gray-to-gray edges. However, it also doesn't affect images without sharp edges, so it won't cause any regressions in encoding of typical photographic images.
- The improvement is most visible in black-on-white text and line art. White-on-black doesn't benefit as much, probably due to gamma. In my initial implementation I handle white-on-black only.
- Deringing is especially valuable when images are re-encoded. It prevents ringing artifacts from accumulating.