Currently we separate the image into 4 separate one channel images, blur each of them, and then join them back together again. I suspect we can do better if we blurred the whole thing at once.
A good experiment to try here would be to rig up SkBlurImageFilter from Skia and see how we compare on performance. They have some SSE optimizations in their code, and it operates on each pixel at once. It would at least tell us if we're really slower without investing too much effort.
I have a patch somewhere that does this. Let me dig it up.