We should optimize texture conversions by doing them GPU-side at texImage2D upload time. We can even upload weird data like YCbCr, and even do (un)premultiplication and color conversions. We can also do weird things like zeroing textures for texImage2D(null) calls in WebGL, or possible scaling, if that'd be useful. Note that many mobile platforms don't support float-render-targets, so we will have to detect what we can and can't do.
Note we'll have to handle BGRA->RGBA swizzling but that's not a problem either in a fragment shader. Probably the most nontrivial part will be to decide how to manage the various cached objects at hand here. I mean, to do the conversion, for each different format, you'll have to compile 2 shaders, link 1 program ... so it's going to be nontrivial to decide how long to keep them around in case they are needed again for another texture conversion between the same formats. I guess as a first approach it's reasonable to keep different shaders/programs alive for each different format conversion for the whole lifetime of the WebGL context; then, as an optimization over that, either set a limit on the cache size and discard the oldest-used shaders/programs when needed, or free them based on a timer.