Mixing Packed RGB Pixels Efficiently

This describes a mixing/average operation on 15-bit RGB pixels of the form 0RRRRRGG GGGBBBBB. Below, X and Y refer to the two source pixels. This discussion can be applied to any packed format, with each channel occupying differing numbers of bits (for example, 16-bit RGB format with 6 bits for the green channel), and more than three channels.

The mix operation finds the average of the channels of each pixel:

    result = (X + Y) / 2;

It would be more efficient to add them in packed form, rather than unpacking them and applying the operation on each individually. To add the packed channels, there must be room for a carry from each channel, that is, at least one zero bit between each channel. A simple way to insert zero bits is to move the green channel out from between the red and blue, up into the high bits:

    X = ((X << 16) | X) & 0x03E07C1F;

Do the same with Y, add them, shift right, then repack the bits:

    Y = ((Y << 16) | Y) & 0x03E07C1F;
    result = (X + Y) >> 1 & 0x03E07C1F;
    result = (result >> 16) | result;

A bad way to mix two pixels is to clear the low bit of each channel before adding, since this results in lost precision. Consider mixing 0x1F with 0x1F; this should produce 0x1F, but the following produces 0x1E:

    // bad way
    result = ((X & 0x7BDE) + (Y & 0x7BDE)) >> 1;

The bad way points to a good way that's nearly as efficient. The bad way masks to ensure there are zero bits between each channel; is there were a way to ensure zero bits without sacrificing accuracy?

It's the low bit of each channel that needs to be dealt with. There are four combinations of the low bits from a given channel of X and Y: 0+0, 0+1, 1+0, and 1+1. For example, 0x1F has the low bit set, and 0x1E doesn't, so this is combination 1+0. Consider each of the four cases and what needs to occur:

When both low bits are 0, nothing needs to be done; the values can be added together without any masking:

    result = (X + Y) >> 1;

When both low bits are 1, the same applies, since the two low bits add together to result in zero (with a carry to the next bit). This is a bit subtle, so consider a concrete example, adding these two colors:

      RRRRR GGGGG BBBBB
    0 00001 00001 11111 X
+   0 00001 00001 00001 Y
    -------------------
    0 00010 00011 00000
    R RRRRG GGGGB BBBB-

The two red channels with value 1 left the output bit at that position clear for a possible carry from the green channel. The same for the green's low bits added together, and here there was a carry from the blue that properly appears in the result.

When only one low bit is 1, it needs to be cleared manually. We can determine low bits which fit this case by XORing X and Y together and masking to just the low bits:

      RRRRR GGGGG BBBBB
    0 00001 00001 11111 X
XOR 0 00000 00000 00001 Y
    -------------------
    0 00001 00001 11110
AND 0 00001 00001 00001
    -------------------
    0 00001 00001 00000 adjustment

Finally, to remove the low bits that fit this special case, simply subtract the masked result above from the sum of X and Y:

      RRRRR GGGGG BBBBB
    0 00001 00001 11111 X
+   0 00000 00000 00001 Y
    -------------------
    0 00001 00010 00000
-   0 00001 00001 00000 adjustment
    ------------------- 
    0 00000 00001 00000
    R RRRRG GGGGB BBBB-

This is the correct result (it just needs to be shifted right by one bit). In the intermediate result, note how the low bit of G added with the carry from B, but was reversed by subtracting the low bit back out. Here's code which implements this optimal algorithm:

    result = (X + Y - ((X ^ Y) & 0x0421)) >> 1;

This uses only one operation more than the bad code from earlier, and matches the original code's output using half the operations (5 versus 11). You can even set the rounding direction; add the adjustment instead of subtracting it to round up instead of down, for example 0x1E + 0x1F would produce 0x1F when rounding up, and 0x1E when rounding down.

    // rounds up
    result = (X + Y + ((X ^ Y) & 0x0421)) >> 1;

Additionally, this can be used to mix two RGB pixels contained in a 32-bit integer, something the first implementation would have taken 22 operations to do. Nice!

Adding RGB Pixels With Clamping

Subtracting RGB Pixels With Clamping

Back to Blargg's Information