Closed Bug 748100 (ICODecoder) Opened 12 years ago Closed 10 years ago

Some favicon.ico files like www.golem.de/favicon.ico do not appear in the fennec URL bar - dont decode properly

Categories

(Firefox for Android Graveyard :: General, defect)

All
Android
defect
Not set
normal

Tracking

(relnote-firefox 29+, blocking-fennec1.0 -, fennec+)

RESOLVED FIXED
Firefox 29
Tracking Status
relnote-firefox --- 29+
blocking-fennec1.0 --- -
fennec + ---

People

(Reporter: mbrubeck, Assigned: ckitching)

References

Details

(Whiteboard: [QA+])

Attachments

(2 files, 12 obsolete files)

106.92 KB, image/png
Details
62.71 KB, patch
Details | Diff | Splinter Review
To reproduce, load http://www.golem.de/ in native Fennec.

Expected results: Favicon appears in the URL bar.
Actual results: The default "no favicon" icon appears in the URL bar.

http://www.golem.de/favicon.ico is the favicon.  It renders correctly in desktop Firefox and in XUL Fennec, but is missing in Native Fennec and in the stock Android browser.  It is probably an example of an ICO file that is not compatible with Android's Skia library which we use to draw favicons in Native Fennec.  See bug 726001 comment 5 for context.
Possibly related:  http://www.3dcenter.org/sites/all/themes/dreid_d5/favicon.ico is an ICO file that renders but with very poor quality, in both the stock browser and Native Fennec.  (It renders with much higher quality in XUL Fennec and on desktop.)

Thank you to Bockworscht on Mozillazine for both of these examples.
tracking-fennec: --- → ?
Summary: Some favicon.ico files like www.golem.de/favicon.ico do not appear in the fennec URL bar → Some favicon.ico files like www.golem.de/favicon.ico do not appear in the fennec URL bar - dont decode properly
Nominating. This hits some larger sites such as http://www.flickr.com/favicon.ico where the colors are off.
blocking-fennec1.0: --- → ?
tracking-fennec: ? → 15+
blocking-fennec1.0: ? → -
We could try to expose the Gecko ICO decoder and use it in Java.
tracking-fennec: 15+ → +
This isn't surprising - Android's BitmapFactory decoder doesn't support ico[1]. Since ICO is a container format, it is just a collection of one or more PNG or BMP images surrounded by some extra guff to index them[2]. To a png/bmp decoder, an ICO containing just one image of the appropriate type will look like a sane image with some extra junk around it - a clever decoder plausibly can see past this and make things work. I suspect that this is the only reason that Fennec ever manages to decode favicon ICO files (That and it seems that some sites have a thing called "favicon.ico" which is really a gif or a png - but that's cheating).

It's also worth noting that since ICOs can contain multiple images of different resolutions - it'd be nice if we were to decode these properly and extract the largest such image. Most likely the easiest way to win is to do just what mfinkle suggested in Comment 3.

[1]http://developer.android.com/guide/appendix/media-formats.html
[2]http://en.wikipedia.org/wiki/ICO_%28file_format%29#Outline
favicon.ico can be a real ico file, but a good decoder probably doesn't care what the extension is and would properly decode a gif if you named it that too.

I started to put together a solution for this once in bug 855911 using message passing. Exposing Gecko's image decoding directly would be nice too.
Assignee: nobody → ckitching
Depends on: 888326
No longer depends on: 726001
Attached image Victory!
It occurred to me that the ICO format actually isn't that complicated, and since it's a container format no actual image decoding work needs to happen.
Instead of paying the (vast) cost of decoding the image in Gecko-land and sending it across the bridge, we can just parse the ICO's metadata in Java, extract the largest PNG image that's hiding inside, and pass that to the Android decoding library like we've always done.

This patch is sort of rolled up in my ongoing work for Bug 888326 - I'll be uploading the patch series there in a few hours. Problem solved!

Here's a shiny screenshot of the little Favicon that could.
Depends on the patches from Bug 888326. Adds ability to decode ICO files by leveraging the ability of Android to decode just about everything else (Essentially, the "decoder" provided just reads the ICO metadata and works out the appropriate chunk of the ICO that represents a valid image in either BMP or PNG format, before feeding that to Android for decoding.)

Also - ICO is a properly, properly weird format, it seems.
Attachment #800458 - Flags: feedback?(margaret.leibovic)
Well done, Chris! You jumped into something (ICO container format) we were avoiding and made it look easy.
So after some more work I made some slightly interesting discoveries. Turns out my earlier thoughts were partially incorrect.

Contrary to the documentation, the decoder used by Android DOES support ICOs. The problem is it's really not that good at it. If the ICO contains a PNG, it seems to fail completely (Causing this bug). If the ICO contains a BMP, it just seems to select an arbitrary one of the BMPs in the ICO and decodes that (Causing Bug 813810 and Bug 855911 when it gets the wrong one and we upscale it horribly.).

But what it can do reliably is decode PNGs, and, if presented with an ICO with exactly one BMP it also works no problem (BMPs inside ICOs are actually somewhat tricky to decode - they're not really BMPs. By comparism, ICO-ified PNGs are literally PNGs you can cut out and paste elsewhere)

So, slight change to the earlier plan. Previously, whatever was downloaded would be run through Android and, if we got back null, I'd run it through my ICODecoder, chop out the payload for the largest image, and run that through Android. This failed in two ways:
1) If the thing downloaded was an ICO containing BMPs, Android wouldn't fail to decode it, but might extract the wrong image from it and give us ugly pictures.
2) The code present to reconstitute BMP headers to facilitate traditional BMP decoding after extracting the payload from a BMP-bearing ICO was never used, as I had not realised that Android doesn't fail to decode such images.

In this new approach, I determine if the downloaded image is an ICO (Via exclusion - every other supported image type has an associated magic number sequence that can be check, ICO does not) and, if it is, determine the nature of the payload of the largest image therein.
If the largest image is a PNG, the payload is extracted and handed to Android, just as before.
If the largest image is a BMP, all other payloads are stripped from the ICO, and Android is asked to decode the resulting pruned ICO (Consisting of only the icon we want), thus handily sidestepping the issue of dealing with the wizardry that goes into BMP encoding in ICOs.

If the downloaded image was found to be an ordinary, non-ICO image, it's simply decoded in the obvious way.

The results of always getting the largest Favicon available are quite striking:

Before:
https://www.dropbox.com/s/h2u6wh1xtspdggg/NightlyHistory.png

After:
https://www.dropbox.com/s/lel6cuk096gfh4i/MyHistory.png
Attachment #800458 - Attachment is obsolete: true
Attachment #800458 - Flags: feedback?(margaret.leibovic)
Attachment #800625 - Flags: review?(margaret.leibovic)
Nice. Grr.. Sites will go to crazy lengths to repack png's and save a few bytes, but seem happy to send us these ico files filled with image sizes we'll never show.
(In reply to Wesley Johnston (:wesj) from comment #11)
> Nice. Grr.. Sites will go to crazy lengths to repack png's and save a few
> bytes, but seem happy to send us these ico files filled with image sizes
> we'll never show.

Indeed. Microsoft.com serves one up with 128x128, 78x78, 64x64, 48x48, 32x32, 24x24 and 16x16.
Clearly, if you invent a format you have to maximise your use of it. :P
Come to think of it, the NVidia is still showing some weird artefacts.. Looks like overzealous dithering. The increase in resolution has helped, but there's still some marked difference in the way it's being processed by us vs. Chromium.
Check it out:
http://www.nvidia.com/favicon.ico

On chrome it's a solid green, on Fennec it's got the funky specklyness. Strange. Perhaps uncovering the reasons why the "support" for ICO decoding on Android is undocumented - it seems buggy. I wonder if disabling dithering will make a difference... This might be an artefact we'll have to live with.
I haven't seen the chrome icon. To see the higher res icon on desktop use:

http://www.nvidia.com/favicon.ico#-moz-resolution=256,256

(you may have to hard refresh).
Comment on attachment 800625 [details] [diff] [review]
V2: Further improvements to the decoder.

Wes knows this stuff better than me, so I'll give him the pleasure of reviewing :)
Attachment #800625 - Flags: review?(margaret.leibovic) → review?(wjohnston)
So the previous set of pictures were nice, but the NVidia Favicon was still suffering from a case of weird specklyness - made less bad by being enlarged, but still weird and sucky.
Fixed that:

https://www.dropbox.com/s/dd8aj12yektrah3/NvidiaFix.png

Turns out NVidia, ironically, aren't very good at image encoding. Their Favicon contained three different sizes, and three images of each size. Each of the duplicates of a particular size was encoded at a different colour depth to the others.
But.. The NVidia logo is 100% green. You could represent each pixel (Sans AA) with just one bit (And an appropriate palette). Strangely, their method of encoding the lower colour-depth versions of the icon introduced this horrid specklyness.
So, the ICODecoder now selects based on colour depth, as well as size, ensuring we always get the best image from an ICO, even in the face of properly, properly strange encoding.
Attachment #800625 - Attachment is obsolete: true
Attachment #800625 - Flags: review?(wjohnston)
Attachment #801039 - Flags: review?(wjohnston)
Blocks: 855911
Blocks: 895423
Status: NEW → ASSIGNED
No longer blocks: DecodeOnlyPossibles
Blocks: 914058
Depends on: FaviconRevamp
No longer depends on: 888326
Unbitrot, cleaning up some leftover debugging code.
Attachment #801039 - Attachment is obsolete: true
Attachment #801039 - Flags: review?(wjohnston)
Attachment #802079 - Flags: review?(wjohnston)
Blocks: 914950
Making less incomprehensible. Hopefully this one is more reviewer-friendly.
Attachment #802079 - Attachment is obsolete: true
Attachment #802079 - Flags: review?(wjohnston)
Attachment #802836 - Flags: review?(wjohnston)
Comment on attachment 802836 [details] [diff] [review]
V5: NVidia's icon no longer speckly.

Review of attachment 802836 [details] [diff] [review]:
-----------------------------------------------------------------

Looks really good. I asked for comments, and I got them. Too many maybe. Try to be careful your comment isn't just an english repeat of the code below it, and try to use fewer words where you can. I'm a bit torn about them tbh, because they do make reading this much much easier.

Holding off on r mostly because I think it would be good to split this up a bit, instead of having large functions (or making our already too large functions, larger). So I gave this f+. Just needs some cleanup before landing.

::: mobile/android/base/favicons/ICODecoder.java
@@ +25,5 @@
> + *  0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * |  Reserved field. Must be zero |  Type (1 for ICO, 2 for CUR)  |
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * |Image count (n)| Reserved (0)  |

Image count is 2 bytes long according to wikipedia?

@@ +106,5 @@
> +
> +        // Here, and in many other places, byte values are ANDed with 0xFF. This is because Java
> +        // bytes are signed - to obtain a numerical value of a longer type which holds the unsigned
> +        // interpretation of the byte of interest, we do this.
> +        int numEncodedImages = mDecodeee[mOffset+4] & 0xFF;

again, is this 2 bytes?

@@ +118,5 @@
> +        int bufferIndex = mOffset+ICO_HEADER_LENGTH;
> +
> +        // We now iterate over the Icon Directory, at each step checking to see if the image there
> +        // is better than our current best.
> +        for (int i = 0; i < numEncodedImages; i++) {

I would split everything in here into a separate function.

@@ +123,5 @@
> +            // At each step, bufferIndex is set to the base address of the Icon Directory Entry we
> +            // are currently considering.
> +
> +            // Favicons must be square. We check that the width and height fields of this Icon
> +            // Directory Entry are the same. If not, we skip this entry.

This isn't really true. We'd be fine with a non-square icon.

@@ +125,5 @@
> +
> +            // Favicons must be square. We check that the width and height fields of this Icon
> +            // Directory Entry are the same. If not, we skip this entry.
> +            if (mDecodeee[bufferIndex] != mDecodeee[bufferIndex+1]) {
> +                bufferIndex += 16;

Use your consts: ICO_ICONDIRENTRY_LENGTH

@@ +144,5 @@
> +                imageWidth = 256;
> +            }
> +
> +            // If the image uses a colour palette, this is the number of colours, else zero.
> +            int paletteSize = mDecodeee[bufferIndex+2] & 0xFF;

You might want to break this out into a function as well, since its repeated all over. i.e.

int getUInt8(int offset) {
  return mDecodeee[offset] & 0xFF;
}

@@ +150,5 @@
> +            // The plane count - usually 0 or 1. When > 1, taken as multiplier on bitsPerPixel.
> +            int colourPlanes = mDecodeee[bufferIndex+4] & 0xFF;
> +
> +            int bitsPerPixel = (mDecodeee[bufferIndex+6] & 0xFF)
> +                             | (mDecodeee[bufferIndex+7] & 0xFF) << 8;

And getUInt16(offset) here....

@@ +152,5 @@
> +
> +            int bitsPerPixel = (mDecodeee[bufferIndex+6] & 0xFF)
> +                             | (mDecodeee[bufferIndex+7] & 0xFF) << 8;
> +
> +            if (colourPlanes > 1) {

What are we, savages? "colorPlanes"

@@ +159,5 @@
> +
> +            // If we have found a new largest, record it.
> +            boolean isBestImage = false;
> +
> +            // Decide if we like this image better than the current best.

And if you split this function up, you'll need to keep this outside that function. But you'll probably have a better object anyway so you could do something like

if (curImg.compare(bestImg)) {
  bestImg = curImg;
}

which feels a bit more readable to me.

@@ +205,5 @@
> +
> +        // Since the ICO format is little-endian and Java bytes are signed, extracting ints from the
> +        // 4-byte fields becomes a mess.
> +        // Each byte is taken in turn, its unsigned representation stored in an int, which is then
> +        // shifted by the appropriate number of bytes, and ORed with its neighbours.

This comment is probably a little to much telling me what the code already says (and its pretty standard to see this stuff when parsing files). Putting it in a getUInt32(offset) function would be nice though.

@@ +238,5 @@
> +
> +    /**
> +     * Delete all entries from this ICO except the one selected in decodeLargestImage
> +     */
> +    public void prune() {

Is it worth doing all this and not just putting the bitmap in a buffer and handing that to Android? This seems fragile...

::: mobile/android/base/favicons/LoadFaviconTask.java
@@ +142,5 @@
>              HttpEntity entity = response.getEntity();
>              if (entity == null)
>                  return null;
>  
> +            // This may not be provided, but if it is, it's useful.

Started before you, but this function is way to long. Can you put this new code in some separate functions?

@@ +192,5 @@
> +
> +                        // Copy the contents of the old buffer into the new buffer.
> +                        System.arraycopy(buffer, 0, newBuffer, 0, buffer.length);
> +
> +                        // Replace the old buffer with the new. (Old one now available for GC.)

Again, the comments here are too verbose.

@@ +216,5 @@
> +                // So, we cheat. For PNG ICOs we extract the PNG and make Android decode it.
> +                // For BMP ICOs, we delete every BMP from the ICO that we don't want, and give the
> +                // pruned ICO to Android.
> +                //
> +                // This way, we don't have to do any of the actually difficult work of writing decoders.

This can be summarized. "We're probably dealing with an ico"

@@ +232,5 @@
> +                    return null;
> +                }
> +
> +                // No sanity checks failed determining the largest image
> +                if (decoder.getIsPNG()) {

Don't name this with a get. Just isPNG().

::: mobile/android/base/gfx/BitmapUtils.java
@@ +45,5 @@
> +     * @param offset Offset at which to look for magic numbers.
> +     * @return true if the buffer contains a bitmap decodable by Android (Or at least, a sequence
> +     *         starting with the magic numbers thereof). false otherwise.
> +     */
> +    public static boolean isDecodableByAndroid(byte[] buffer, int offset) {

I'm a bit leery that we really need this and can't trust mimetypes/extensions. But since its here, it might be a little cleaner to store these in an enum and loop over it. i.e:

enum ImgFormat {
  PNG(int[] { -0x77, 0x50, 0x43 }),
  GIF(int[] { 0x47, 0x49, ... }),
  ...

  public int[] magicNums;
  public ImgFormat(int[] nums)
}

And loop over
for (ImgFormat format : ImgFormat.values()) {
Attachment #802836 - Flags: review?(wjohnston) → feedback+
Attached patch V6: Cleanup. (obsolete) — Splinter Review
Thanks for the prompty review - note that this one isn't (Probably) going to make 26, since its dependancies are large and liable to be in review hell for a long time yet.

Still, addressing your complaints.. (And fixing the ones I'm not replying to as suggested)

> @@ +123,5 @@
> > +            // At each step, bufferIndex is set to the base address of the Icon Directory Entry we
> > +            // are currently considering.
> > +
> > +            // Favicons must be square. We check that the width and height fields of this Icon
> > +            // Directory Entry are the same. If not, we skip this entry.
> 
> This isn't really true. We'd be fine with a non-square icon.

Really? Okay then - check removed.

> @@ +144,5 @@
> > +                imageWidth = 256;
> > +            }
> > +
> > +            // If the image uses a colour palette, this is the number of colours, else zero.
> > +            int paletteSize = mDecodeee[bufferIndex+2] & 0xFF;
> 
> You might want to break this out into a function as well, since its repeated
> all over. i.e.
> 
> int getUInt8(int offset) {
>   return mDecodeee[offset] & 0xFF;
> }
> 
> @@ +150,5 @@
> > +            // The plane count - usually 0 or 1. When > 1, taken as multiplier on bitsPerPixel.
> > +            int colourPlanes = mDecodeee[bufferIndex+4] & 0xFF;
> > +
> > +            int bitsPerPixel = (mDecodeee[bufferIndex+6] & 0xFF)
> > +                             | (mDecodeee[bufferIndex+7] & 0xFF) << 8;
> 
> And getUInt16(offset) here....

I am quite reluctant to split the bitwise assembly of the bytes into an ints into seperate functions - the amount of work done calling the function greatly exceeds the amount of work done by the function in that case. It would make the code many times slower. I don't think these things, once you've figured out what they're doing, are particularly harmful to readabilitty, are they? Certainly, the suggested cleanup would have a detrimental effect on performance. (At least, pre-Proguard)

> @@ +159,5 @@
> > +
> > +            // If we have found a new largest, record it.
> > +            boolean isBestImage = false;
> > +
> > +            // Decide if we like this image better than the current best.
> 
> And if you split this function up, you'll need to keep this outside that
> function. But you'll probably have a better object anyway so you could do
> something like
> 
> if (curImg.compare(bestImg)) {
>   bestImg = curImg;
> }
> 
> which feels a bit more readable to me.

Since object creation in Java is astronomically expensive (And I am working under the assumption that this decoder has to be as high performance as is sane), implementing a comparator would be expensive.
Is the revised implementation (Which is in any case somewhat cleaner, just not in exactly the way suggested above) acceptable?

> @@ +205,5 @@
> > +
> > +        // Since the ICO format is little-endian and Java bytes are signed, extracting ints from the
> > +        // 4-byte fields becomes a mess.
> > +        // Each byte is taken in turn, its unsigned representation stored in an int, which is then
> > +        // shifted by the appropriate number of bytes, and ORed with its neighbours.
> 
> This comment is probably a little to much telling me what the code already
> says (and its pretty standard to see this stuff when parsing files). Putting
> it in a getUInt32(offset) function would be nice though.
> 
> @@ +238,5 @@
> > +
> > +    /**
> > +     * Delete all entries from this ICO except the one selected in decodeLargestImage
> > +     */
> > +    public void prune() {
> 
> Is it worth doing all this and not just putting the bitmap in a buffer and
> handing that to Android? This seems fragile...

What you suggest was my first implementation (It'd mean just doing exactly the same for PNGs and ICOs). The problem lies in the strange way that ICOs store BMPs - the BMP in an ICO is not *Really* a BMP. It lacks a BMP header, and the pixel table is (potentially) in a quite bizzaire format. It is such that just handing it to a BMP decoder will fail. My early work involved an attempt to reconstruct a BMP header describing the actual format of the BMP there, but it turns out that the information needed to do so is not available without iterating over the pixel array - a highly expensive operation to do in Java code. I despair for the sanity of this format.
Conversely, ICO stores PNGs in their entirety - a PNG in an ICO can literally be cut out and decoded (As is done). Alas, this cannot be done with BMP.

So - what to do? It turns out that Android's decoder of ICOs, while buggy, is capable of decoding single-image ICOs containing BMPs (But not PNGs). This means that we get to sidestep the problem of the strange BMP storage format entirely, and just make Android work that bit out for us - but this means working around the limitation of their decoder. As a result, the only way to choose which image it decodes (Something we want to do) is to do what the prune function does.
 
> @@ +192,5 @@
> > +
> > +                        // Copy the contents of the old buffer into the new buffer.
> > +                        System.arraycopy(buffer, 0, newBuffer, 0, buffer.length);
> > +
> > +                        // Replace the old buffer with the new. (Old one now available for GC.)
> 
> Again, the comments here are too verbose.

Very much so. Cleaned up the whole region quite a bit now. I took you seriously when you asked for more comments :P.

> ::: mobile/android/base/gfx/BitmapUtils.java
> @@ +45,5 @@
> > +     * @param offset Offset at which to look for magic numbers.
> > +     * @return true if the buffer contains a bitmap decodable by Android (Or at least, a sequence
> > +     *         starting with the magic numbers thereof). false otherwise.
> > +     */
> > +    public static boolean isDecodableByAndroid(byte[] buffer, int offset) {
> 
> I'm a bit leery that we really need this and can't trust
> mimetypes/extensions. But since its here, it might be a little cleaner to
> store these in an enum and loop over it. i.e:
> 
> enum ImgFormat {
>   PNG(int[] { -0x77, 0x50, 0x43 }),
>   GIF(int[] { 0x47, 0x49, ... }),
>   ...
> 
>   public int[] magicNums;
>   public ImgFormat(int[] nums)
> }
> 
> And loop over
> for (ImgFormat format : ImgFormat.values()) {

We absolutely need this. Quite a number of sites serve up things called "favicon.ico" which are really pngs or gifs.
The only reliable way of determining the type of a file is to look at the file. Why shouldn't we do this, given that we can, it costs very little, and it decreases the amount of times we fail to decode a favicon. Removal of this will make the situation strictly worse (It is also worth noting that other browsers, desktop FF included, don't fail to decode a favicon when presented with an PNG claiming to be an ICO).

Your suggestion about cleanup seems sensible. Using an enum does make it considerably more concise.

If you do r+ this - don't mark for checkin - dependancies not yet landed - it will not end well.
Attachment #802836 - Attachment is obsolete: true
Attachment #804388 - Flags: review?(wjohnston)
Alias: ICODecoder
Rebased on top of V32 of the favicon cache patch.

Added comments pointing to Bug 914058 - necessary to make completely optimal use of ICOs, although I lack the time at the moment to do that additional work.
Decoding the largest and downscaling is better than nothing, right? Can handle the remainder in Bug 914058.
Attachment #804388 - Attachment is obsolete: true
Attachment #804388 - Flags: review?(wjohnston)
Attachment #816098 - Flags: review?(rnewman)
Whiteboard: [QA+]
Comment on attachment 816098 [details] [diff] [review]
V7 - Add support for decoding ICOs

Review of attachment 816098 [details] [diff] [review]:
-----------------------------------------------------------------

My initial review comment: zomg this needs tests.

I want to see a bundle of .ico files of different sizes, compositions, empty ones, invalid ones, giant ones, corrupt ones, ones with PNGs, ones with bitmaps, some with both... because I don't want us crashing/OOMing/otherwise misbehaving when we're hoovering bytes off the web.

Given that ICODecoder is standalone, and there's no reason why LoadFaviconTask should be the right home for bytes -> Bitmap, I propose a refactoring here to make a standalone IconDecoder class that does exactly that.

Then write a pile of black-box tests for it.

Code review comments coming later.
Comment on attachment 816098 [details] [diff] [review]
V7 - Add support for decoding ICOs

Review of attachment 816098 [details] [diff] [review]:
-----------------------------------------------------------------

Couple more flying comments. This definitely would benefit from some tests to illustrate the edge cases, and the rearchitecting I mentioned in my last comment.

::: mobile/android/base/favicons/ICODecoder.java
@@ +63,5 @@
> +    public static final int ICO_HEADER_LENGTH = 6;
> +    public static final int ICO_ICONDIRENTRY_LENGTH = 16;
> +
> +    // The buffer containing bytes to attempt to decode.
> +    private final byte[] mDecodeee;

s/eee/ee

@@ +81,5 @@
> +    // interest.
> +    private int mBestIndex;
> +
> +    private int mLargestWidth;
> +    private int mBestPalletteSize = -1;

Palette

@@ +87,5 @@
> +
> +    public ICODecoder(byte[] buffer, int offset, int len) {
> +        mDecodeee = buffer;
> +        mOffset = offset;
> +        mLen = len;

if (buffer.length < (offset + len)) {
    throw new IllegalArgumentException("Provided ICO buffer too small.");
}
if (len < ICO_HEADER_LENGTH) {
    throw new IllegalArgumentException("Provided length too small for ICO header.");
}

And if not, then probably this is a PNG, right?

@@ +104,5 @@
> +        // specifies ICO. If not, we've probably been given something that isn't really an ICO.
> +        if (mDecodeee[mOffset] != 0
> +         || mDecodeee[mOffset+1] != 0
> +         || mDecodeee[mOffset+2] != 1
> +         || mDecodeee[mOffset+3] != 0) {

|| at end of line, not start; spaces around arithmetic ops. (Throughout.)

@@ +141,5 @@
> +        // Directory Entry we want to look at, getting us to the first byte of the Icon Directory
> +        // Index of interest.
> +        // Finally, we add 8 - the offset of the first byte of the image data size field in the
> +        // Icon Directory Entry of interest.
> +        bufferIndex = mOffset + ICO_HEADER_LENGTH + (ICO_ICONDIRENTRY_LENGTH * mBestIndex) + 8;

What is mBestIndex if this is an empty icon directory?

@@ +147,5 @@
> +        // Since the ICO format is little-endian and Java bytes are signed, extracting ints from the
> +        // 4-byte fields becomes a mess.
> +        // Each byte is taken in turn, its unsigned representation stored in an int, which is then
> +        // shifted by the appropriate number of bytes, and ORed with its neighbours.
> +        mDecodedLength = (mDecodeee[bufferIndex] & 0xFF)

... will throw an ArrayIndexOutOfBounds exception with a malformed mBestIndex, no?

@@ +158,5 @@
> +
> +        mDecodedOffset = (mDecodeee[bufferIndex] & 0xFF)
> +                       | (mDecodeee[bufferIndex+1] & 0xFF) << 8
> +                       | (mDecodeee[bufferIndex+2] & 0xFF) << 16
> +                       | (mDecodeee[bufferIndex+3] & 0xFF) << 24;

Check your ranges!

@@ +185,5 @@
> +     * @param iconDirectoryIndex The index in the Icon Directory of this entry.
> +     */
> +    private void consumeIconDirEntry(int bufferIndex, int iconDirectoryIndex) {
> +        // Verify that the reserved field is really zero. If not, bin the entry.
> +        if(mDecodeee[bufferIndex+3] != 0) {

Space before condition.
Attachment #816098 - Flags: review?(rnewman) → feedback+
Some flying responses.

(In reply to Richard Newman [:rnewman] from comment #23)
> Comment on attachment 816098 [details] [diff] [review]
> V7 - Add support for decoding ICOs
> 
> Review of attachment 816098 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Couple more flying comments. This definitely would benefit from some tests
> to illustrate the edge cases, and the rearchitecting I mentioned in my last
> comment.
> 
> ::: mobile/android/base/favicons/ICODecoder.java
> @@ +63,5 @@
> > +    public static final int ICO_HEADER_LENGTH = 6;
> > +    public static final int ICO_ICONDIRENTRY_LENGTH = 16;
> > +
> > +    // The buffer containing bytes to attempt to decode.
> > +    private final byte[] mDecodeee;
> 
> s/eee/ee

But it is the one being decoded. The decode-ee. :P
Same construction as 

> @@ +87,5 @@
> > +
> > +    public ICODecoder(byte[] buffer, int offset, int len) {
> > +        mDecodeee = buffer;
> > +        mOffset = offset;
> > +        mLen = len;
> 
> if (buffer.length < (offset + len)) {
>     throw new IllegalArgumentException("Provided ICO buffer too small.");
> }
> if (len < ICO_HEADER_LENGTH) {
>     throw new IllegalArgumentException("Provided length too small for ICO
> header.");
> }
> 
> And if not, then probably this is a PNG, right?

No. This throw is "The total size of the bitmap you gave me to decode is less than the size needed to store an ICO header".
That said, this should probably not actually throw, it should just fail to decode in the decode function so we can handle it neatly.

It certainly doesn't mean it's probably a PNG - if memory serves, we always attempt to decode the payload using Android's decoder before we hand it to the ICO decoder. As such, anything that gets here is known to not be one of the image types Android can decode. It's either an ICO or something strange/broken.

Certainly, though, the ICO decoder, since it's being given any blobs we don't know how to decode via a different approach, needs to be able to not explode in the face of random data.

> @@ +141,5 @@
> > +        // Directory Entry we want to look at, getting us to the first byte of the Icon Directory
> > +        // Index of interest.
> > +        // Finally, we add 8 - the offset of the first byte of the image data size field in the
> > +        // Icon Directory Entry of interest.
> > +        bufferIndex = mOffset + ICO_HEADER_LENGTH + (ICO_ICONDIRENTRY_LENGTH * mBestIndex) + 8;
> 
> What is mBestIndex if this is an empty icon directory?

It shouldn't matter - cases like that should return false from the decode call. I think this particular case isn't handled, though. I'll tack that on.

> 
> @@ +147,5 @@
> > +        // Since the ICO format is little-endian and Java bytes are signed, extracting ints from the
> > +        // 4-byte fields becomes a mess.
> > +        // Each byte is taken in turn, its unsigned representation stored in an int, which is then
> > +        // shifted by the appropriate number of bytes, and ORed with its neighbours.
> > +        mDecodedLength = (mDecodeee[bufferIndex] & 0xFF)
> 
> ... will throw an ArrayIndexOutOfBounds exception with a malformed
> mBestIndex, no?

Possibly.

Actually, I need to go through this again and make sure fields containing mad values don't cause exceptions (Or, perhaps, that they do but this is okay - it might be in order to just wrap the whole lot in an exception handler and claim failure if something throws.)

> @@ +158,5 @@
> > +
> > +        mDecodedOffset = (mDecodeee[bufferIndex] & 0xFF)
> > +                       | (mDecodeee[bufferIndex+1] & 0xFF) << 8
> > +                       | (mDecodeee[bufferIndex+2] & 0xFF) << 16
> > +                       | (mDecodeee[bufferIndex+3] & 0xFF) << 24;
> 
> Check your ranges!

Or don't, stick the entire function in a try block, and deal with the exceptions as failures. (The ranges will only be violated if the provided image isn't a sane ICO.)
This is probably neater (And possibly faster) than range checking everywhere.
(In reply to Chris Kitching [:ckitching] from comment #24)

> > s/eee/ee
> 
> But it is the one being decoded. The decode-ee. :P
> Same construction as 

advise/advisee.

http://www.merriam-webster.com/dictionary/advisee


> It certainly doesn't mean it's probably a PNG - if memory serves, we always
> attempt to decode the payload using Android's decoder before we hand it to
> the ICO decoder.

This is partly why I want this to be IconDecoder -- downloaded thing in, Bitmap out. The caller should not be responsible for sniffing content.


> Actually, I need to go through this again and make sure fields containing
> mad values don't cause exceptions (Or, perhaps, that they do but this is
> okay - it might be in order to just wrap the whole lot in an exception
> handler and claim failure if something throws.)

And tests for those things :)


> Or don't, stick the entire function in a try block, and deal with the
> exceptions as failures. (The ranges will only be violated if the provided
> image isn't a sane ICO.)
> This is probably neater (And possibly faster) than range checking everywhere.

Honestly, it's cheaper to validate the numeric range of some integers that you already computed than to have a library call throw and us catch it, and it will allow us to implement more fine-grained error detection and recovery ("oh, this image is malformed, but the container is fine, so let's use the next one...").

Also it's good practice to not rely on Java being immune to (security-affecting?) coding problems!
(In reply to Richard Newman [:rnewman] from comment #23)
> @@ +87,5 @@
> > +
> > +    public ICODecoder(byte[] buffer, int offset, int len) {
> > +        mDecodeee = buffer;
> > +        mOffset = offset;
> > +        mLen = len;
> 
> if (buffer.length < (offset + len)) {
>     throw new IllegalArgumentException("Provided ICO buffer too small.");
> }
> if (len < ICO_HEADER_LENGTH) {
>     throw new IllegalArgumentException("Provided length too small for ICO
> header.");
> }

Added range checks to the decode function for this. Such madness will now gracefully fail to decode the ICO.
Also added some more checks to decodeLargestImage - I'm now fairly sure that random insane data will not cause it to crash, the method will simply return false and no icon will be displayed.

> @@ +104,5 @@
> > +        // specifies ICO. If not, we've probably been given something that isn't really an ICO.
> > +        if (mDecodeee[mOffset] != 0
> > +         || mDecodeee[mOffset+1] != 0
> > +         || mDecodeee[mOffset+2] != 1
> > +         || mDecodeee[mOffset+3] != 0) {
> 
> || at end of line, not start; spaces around arithmetic ops. (Throughout.)

Ugh.

> @@ +141,5 @@
> > +        // Directory Entry we want to look at, getting us to the first byte of the Icon Directory
> > +        // Index of interest.
> > +        // Finally, we add 8 - the offset of the first byte of the image data size field in the
> > +        // Icon Directory Entry of interest.
> > +        bufferIndex = mOffset + ICO_HEADER_LENGTH + (ICO_ICONDIRENTRY_LENGTH * mBestIndex) + 8;
> 
> What is mBestIndex if this is an empty icon directory?

Certainly uninteresting, since control never reaches this point any more if the icon directory is empty, or there is insufficient space to hold as many entries as there should be.

> 
> @@ +147,5 @@
> > +        // Since the ICO format is little-endian and Java bytes are signed, extracting ints from the
> > +        // 4-byte fields becomes a mess.
> > +        // Each byte is taken in turn, its unsigned representation stored in an int, which is then
> > +        // shifted by the appropriate number of bytes, and ORed with its neighbours.
> > +        mDecodedLength = (mDecodeee[bufferIndex] & 0xFF)
> 
> ... will throw an ArrayIndexOutOfBounds exception with a malformed
> mBestIndex, no?

mBestIndex can only contain a value between 0 and numEncodedImages - 1.
There is now a check to ensure that there is enough space in the buffer for numEncodedImages many icon directory entries to exist, so no matter the value of mBestIndex the entire directory entry for that index will exist and be readable without exceptions.
The result might be nonsense, however - this is picked up by subsequent checks on the decoded offset and lengths.
Finally, if the input is nonsense that happens to decode to a noninsane offset and length, the Android image decoder has the fun job of dealing with attempting to decode the resulting noise - which it does gracefully by returning null.

> @@ +158,5 @@
> > +
> > +        mDecodedOffset = (mDecodeee[bufferIndex] & 0xFF)
> > +                       | (mDecodeee[bufferIndex+1] & 0xFF) << 8
> > +                       | (mDecodeee[bufferIndex+2] & 0xFF) << 16
> > +                       | (mDecodeee[bufferIndex+3] & 0xFF) << 24;
> 
> Check your ranges!

With the new checks, we can never reach this point if the region being accessed is not in the array,

(In reply to Richard Newman [:rnewman] from comment #25)
> This is partly why I want this to be IconDecoder -- downloaded thing in,
> Bitmap out. The caller should not be responsible for sniffing content.

That's a sensible idea. Makes adding more decoders in the future nicer, too.

> > Or don't, stick the entire function in a try block, and deal with the
> > exceptions as failures. (The ranges will only be violated if the provided
> > image isn't a sane ICO.)
> > This is probably neater (And possibly faster) than range checking everywhere.
> 
> Honestly, it's cheaper to validate the numeric range of some integers that
> you already computed than to have a library call throw and us catch it, and
> it will allow us to implement more fine-grained error detection and recovery
> ("oh, this image is malformed, but the container is fine, so let's use the
> next one...").

Yes. My suggestion was idiotic.


A final caveat: I do not actually have a working Android phone available at this point, as mine died recently in a tragic bicycle accident involving a japanese tourist, a humpback bridge, and a river.
So I am unable to actually test these changes... Still. It compiles and the changes shouldn't affect the function, so this should still be correct.

No work has yet been done in the direction of unit tests. Do we have JUnit yet? How does one write Java unit tests in Fennec? Robocop seems to be geared towards integration tests.
Certainly, I'll shortly provide a useful collection of interesting favicons I have lying around from testing this and the other favicon code.
Attachment #816098 - Attachment is obsolete: true
Attachment #831714 - Flags: review?(rnewman)
Comment on attachment 831714 [details] [diff] [review]
V8 - Add support for decoding ICOs

Review of attachment 831714 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great! (r=rnewman in the comment, though.)

Next one should be a rubberstamp.

::: mobile/android/base/favicons/decoders/FaviconDecoder.java
@@ +1,1 @@
> +package org.mozilla.gecko.favicons.decoders;

Nit: license.

@@ +31,5 @@
> +    private static boolean isDecodableByAndroid(byte[] buffer, int offset) {
> +        magicNumbers:
> +        for (ImageMagicNumbers m : ImageMagicNumbers.values()) {
> +            for (int i = 0; i < m.value.length; i++) {
> +                if (buffer[offset + i] != m.value[i]) {

Perhaps buy some reusability?

private static boolean bufferStartsWith(byte[] buffer, byte[] test, int bufferOffset) {
    if (buffer.length < test.length) {
        return false;
    }
    for (int i = 0; i < test.length; ++i) {
        if (buffer[bufferOffset + i] != test[i]) {
            return false;
        }
    }
    return true;
}

Companion to the Java `Arrays` utility class methods.

@@ +69,5 @@
> +
> +        if (decoder.isPNG()) {
> +            // The payload we want is a PNG - ask Android to decode it.
> +            return BitmapUtils.decodeByteArray(buffer, decoder.getOffset(), decoder.getLength());
> +        } else {

No need for `else`.

@@ +72,5 @@
> +            return BitmapUtils.decodeByteArray(buffer, decoder.getOffset(), decoder.getLength());
> +        } else {
> +            // The payload we want is a BMP - need to prune the other entries before Android
> +            // can decode it for us, in order to work around a bug in the Android decoder.
> +            decoder.prune();

Why isn't this done automatically by the decoder?

@@ +73,5 @@
> +        } else {
> +            // The payload we want is a BMP - need to prune the other entries before Android
> +            // can decode it for us, in order to work around a bug in the Android decoder.
> +            decoder.prune();
> +            return BitmapUtils.decodeByteArray(buffer, decoder.getOffset(), decoder.getLength());

I wrote a little comment about how this logic should be inside the decoder, but then I realized that this avoids the decoder having any dependency on BitmapUtils. Small comment to that effect?

::: mobile/android/base/favicons/decoders/ICODecoder.java
@@ +59,5 @@
> + * The Icon Directory consists of n-many Icon Directory Entries in sequence, with no gaps.
> + */
> +public class ICODecoder {
> +    // The geometry of an ICO file.
> +    public static final int ICO_HEADER_LENGTH = 6;

_BYTES

@@ +63,5 @@
> +    public static final int ICO_HEADER_LENGTH = 6;
> +    public static final int ICO_ICONDIRENTRY_LENGTH = 16;
> +
> +    // The buffer containing bytes to attempt to decode.
> +    private final byte[] mDecodee;

Now that I think of it, shouldn't this be `mDecodand`? :P

@@ +129,5 @@
> +        if (numEncodedImages <= 0) {
> +            return false;
> +        }
> +
> +        // Fail if there is not enough space in the buffer for the stated number of icondir entries.

", let alone the data."

@@ +161,5 @@
> +        // Icon Directory Entry of interest.
> +        bufferIndex = mOffset + ICO_HEADER_LENGTH + (ICO_ICONDIRENTRY_LENGTH * mBestIndex) + 8;
> +
> +        // Since the ICO format is little-endian and Java bytes are signed, extracting ints from the
> +        // 4-byte fields becomes a mess.

It's even more of a mess if those 4-byte fields are supposed to be unsigned, because you're not going to be able to store the maximum value in a Java int. Consider using a long instead, and making sure there's a test for the max decoded length...

@@ +185,5 @@
> +
> +        // Check for PNG magic numbers at the start of the payload.
> +        if (mDecodee[mDecodedOffset] == -0x77
> +         && mDecodee[mDecodedOffset + 1] == 0x50
> +         && mDecodee[mDecodedOffset + 2] == 0x4e) {

Operators at end of line, but really this is

  mPayloadIsPNG = bufferStartsWith(mDecodee, ImageMagicNumbers.PNG, mDecodedOffset);

isn't it?

@@ +221,5 @@
> +        // The plane count - usually 0 or 1. When > 1, taken as multiplier on bitsPerPixel.
> +        int colorPlanes = mDecodee[bufferIndex + 4] & 0xFF;
> +
> +        int bitsPerPixel = (mDecodee[bufferIndex + 6] & 0xFF)
> +                         | (mDecodee[bufferIndex + 7] & 0xFF) << 8;

| at end of line, not start.

@@ +230,5 @@
> +
> +        // If we have found a new largest, record it.
> +        boolean isBestImage = false;
> +
> +        // Decide if we like this image better than the current best.

Don't you want to verify that the image data offset is valid before you do this?

@@ +235,5 @@
> +        if (imageWidth > mLargestWidth) {
> +            isBestImage = true;
> +        } else if (imageWidth == mLargestWidth) {
> +            // After that, prefer maximum colourfulness.
> +            if (bitsPerPixel > mBestBitsPerPixel) {

I think we want to prefer the native BPP: GeckoAppShell.getScreenDepth(). Add that as a constructor argument?

@@ +277,5 @@
> +        // Copy the entry-to-keep to the start of the image entry list.
> +        System.arraycopy(mDecodee, targetEntryOffset, mDecodee, mOffset + ICO_HEADER_LENGTH, ICO_ICONDIRENTRY_LENGTH);
> +
> +        // The offset we wish to put the retained image payload at (First byte after the retained
> +        // Icon Directory Entry.

Syntax.
Attachment #831714 - Flags: review?(rnewman) → feedback+
(In reply to Richard Newman [:rnewman] from comment #27)
> @@ +31,5 @@
> > +    private static boolean isDecodableByAndroid(byte[] buffer, int offset) {
> > +        magicNumbers:
> > +        for (ImageMagicNumbers m : ImageMagicNumbers.values()) {
> > +            for (int i = 0; i < m.value.length; i++) {
> > +                if (buffer[offset + i] != m.value[i]) {
> 
> Perhaps buy some reusability?
> 
> private static boolean bufferStartsWith(byte[] buffer, byte[] test, int
> bufferOffset) {
>     if (buffer.length < test.length) {
>         return false;
>     }
>     for (int i = 0; i < test.length; ++i) {
>         if (buffer[bufferOffset + i] != test[i]) {
>             return false;
>         }
>     }
>     return true;
> }
> 
> Companion to the Java `Arrays` utility class methods.

It might even be worth using some Apache Commons libraries for this sort of boring stuff in the future. There's all sorts of shiny toys available from those guys for doing uninteresting things in Java, and now we (Mostly sort of) have Proguard we can do so without horrid bloat.
Still. That's a whole new project.

Added bonus: We don't need a label any more. That's often a good sign.

> @@ +72,5 @@
> > +            return BitmapUtils.decodeByteArray(buffer, decoder.getOffset(), decoder.getLength());
> > +        } else {
> > +            // The payload we want is a BMP - need to prune the other entries before Android
> > +            // can decode it for us, in order to work around a bug in the Android decoder.
> > +            decoder.prune();
> 
> Why isn't this done automatically by the decoder?

It's something of a hack. What we have here isn't really an ideal decoder, and it doesn't make full use of the new caching system - I somewhat ran out of time when writing it.
What this does is always extracts the largest image from the ICO and hands that to the cache (Where it's downscaled and used as appropriate).
What we would ideally do is extract all the images and pass the whole shaboodle to the cache (There's even a currently-unused function of the cache for this, and much extra complexity in its data structures to correctly handle multiple image sizes). This approach gets complicated w.r.t the database, as you would, with our current schema, end up needing to store the encoded ICO in the DB and on-demand decode it when you need to fetch it again.
The thinking was that the need to prune in all cases when the payload is not a PNG is something that's sort of specific to this one particular special case. Ultimately, we'd want to reshuffle the entries and decode the image repeatedly to get all the BMPs out. By not merging this function with the decode function it's less nonobvious how to perform this extension.
(There's also a bug filed about this somewhere... Bug 914058). I'll probably get back to it at some point before the thermal death of the universe. If you want it sooner, you might have to do it yourself :P.

> @@ +73,5 @@
> > +        } else {
> > +            // The payload we want is a BMP - need to prune the other entries before Android
> > +            // can decode it for us, in order to work around a bug in the Android decoder.
> > +            decoder.prune();
> > +            return BitmapUtils.decodeByteArray(buffer, decoder.getOffset(), decoder.getLength());
> 
> I wrote a little comment about how this logic should be inside the decoder,
> but then I realized that this avoids the decoder having any dependency on
> BitmapUtils. Small comment to that effect?

That, and the thinking was to have the decoder really just decode the ICO container format, leaving the decoding of the actual image payloads to the caller.

> @@ +63,5 @@
> > +    public static final int ICO_HEADER_LENGTH = 6;
> > +    public static final int ICO_ICONDIRENTRY_LENGTH = 16;
> > +
> > +    // The buffer containing bytes to attempt to decode.
> > +    private final byte[] mDecodee;
> 
> Now that I think of it, shouldn't this be `mDecodand`? :P

Touché.

> @@ +129,5 @@
> > +        if (numEncodedImages <= 0) {
> > +            return false;
> > +        }
> > +
> > +        // Fail if there is not enough space in the buffer for the stated number of icondir entries.
> 
> ", let alone the data."

Oddly enough, I did initially think it like that, but pruned it to make it fit into the width limit. :P
Two-line comment it is, then.

> @@ +161,5 @@
> > +        // Icon Directory Entry of interest.
> > +        bufferIndex = mOffset + ICO_HEADER_LENGTH + (ICO_ICONDIRENTRY_LENGTH * mBestIndex) + 8;
> > +
> > +        // Since the ICO format is little-endian and Java bytes are signed, extracting ints from the
> > +        // 4-byte fields becomes a mess.
> 
> It's even more of a mess if those 4-byte fields are supposed to be unsigned,
> because you're not going to be able to store the maximum value in a Java
> int. Consider using a long instead, and making sure there's a test for the
> max decoded length...

I thought about this - while the fields are 4 bytes and unsigned, the only time we're going to have a problem is if the value exceeds (2^31)-1.
This would imply an encoded favicon size of a little over 2GB.
NOPE.

Assuming we actually managed to allocate that monster buffer and get to this point in the problem without OOMing or otherwise catching fire, that the length field overflows and causes the subsequent sanity check to abort the decode attempt is perhaps sensible. The ensuing decode operation on a 2GB encoded payload would take serious compute.
While this means the decoder isn't strictly standards compliant, I submit that the standard is stupid, and therefore should tactfully be ignored in this case.

> @@ +185,5 @@
> > +
> > +        // Check for PNG magic numbers at the start of the payload.
> > +        if (mDecodee[mDecodedOffset] == -0x77
> > +         && mDecodee[mDecodedOffset + 1] == 0x50
> > +         && mDecodee[mDecodedOffset + 2] == 0x4e) {
> 
> Operators at end of line, but really this is
> 
>   mPayloadIsPNG = bufferStartsWith(mDecodee, ImageMagicNumbers.PNG,
> mDecodedOffset);
> 
> isn't it?

So it is. Reusability for the win.

> @@ +230,5 @@
> > +
> > +        // If we have found a new largest, record it.
> > +        boolean isBestImage = false;
> > +
> > +        // Decide if we like this image better than the current best.
> 
> Don't you want to verify that the image data offset is valid before you do
> this?

Urg, this is true. It seems sensible to discard obviously invalid entries. Moar sanity checks!

> @@ +235,5 @@
> > +        if (imageWidth > mLargestWidth) {
> > +            isBestImage = true;
> > +        } else if (imageWidth == mLargestWidth) {
> > +            // After that, prefer maximum colourfulness.
> > +            if (bitsPerPixel > mBestBitsPerPixel) {
> 
> I think we want to prefer the native BPP: GeckoAppShell.getScreenDepth().
> Add that as a constructor argument?

Unclear - do things degrade gracefully when you provide a greater BPP image to a lesser BPP device?
There are some strange edgecases here, like the Nvidia favicon, which looks like soup unless taken at the highest available BPP, due to being weirdly encoded.

Still, it seems noninsane to prefer not to consume extra memory where available - my worry is that people incorrectly encoding the lower BPP versions isn't an isolated problem, and if it degrades gracefully (Just costing a little extra RAM) if you get too many bits, this is probably okay?
Attachment #831714 - Attachment is obsolete: true
Attachment #833286 - Flags: review?(rnewman)
(In reply to Chris Kitching [:ckitching] from comment #28)

> It might even be worth using some Apache Commons libraries for this sort of
> boring stuff in the future. There's all sorts of shiny toys available from
> those guys for doing uninteresting things in Java, and now we (Mostly sort
> of) have Proguard we can do so without horrid bloat.

Yeah, if Proguard turns it into essentially a one-by-one function import, and it doesn't adversely affect build times.

> Still. That's a whole new project.

Aye.


> What this does is always extracts the largest image from the ICO and hands
> that to the cache (Where it's downscaled and used as appropriate).

...

> That, and the thinking was to have the decoder really just decode the ICO
> container format, leaving the decoding of the actual image payloads to the
> caller.

In that case, we really want the decoder to return (or expose) something akin to the ICO directory format itself, right?

"Here's a sequence of entries referring to this byte array. The first is a PNG, 128x128, ...; the second is a ..."

That way we can hide all of this pruning nonsense (if we need the only-one hack, then return just one 'fake' entry), and also implement a fairly agnostic bitmap decoder shim that consumes those entries and a favicon cache, filling it with bitmaps.


> While this means the decoder isn't strictly standards compliant, I submit
> that the standard is stupid, and therefore should tactfully be ignored in
> this case.

Add a brief comment.


> Still, it seems noninsane to prefer not to consume extra memory where
> available - my worry is that people incorrectly encoding the lower BPP
> versions isn't an isolated problem, and if it degrades gracefully (Just
> costing a little extra RAM) if you get too many bits, this is probably okay?

That's a reasonable concern. If we can't experiment with this, can we make it an easily switchable behavior now, while we still all understand how this works?
(In reply to Richard Newman [:rnewman] from comment #29)
> > What this does is always extracts the largest image from the ICO and hands
> > that to the cache (Where it's downscaled and used as appropriate).
> 
> ...

Ah, you weren't there when this was talked about before. It seems to be sensible to land a first version that doesn't exploit the multiple-image property of ICOs, but istead just extracts the best image from them (Which is still a vast improvement over always extracting the smallest which we currently do, if we get anything at all (Which we often don't)), and do the extension work as a followup.

The reason for this is that the extension work is largeish and slightly risky (There's a bunch of awkwardness to deal with with the database, since it suddenly becomes necessary to decode ICOs again when you're refetching them from the database, as well as it activating the most complicated parts of the new caching data structures).
It'd be nice to get mostly-working ICO support landed independently of the wave of problems that may ensue with the full version, if this makes sense. Given where we are in the release cycle this thought may no longer make sense.

In principle this extra work isn't terribly hard, but the original thought was to get the simple version landed months ago. Unfortunately, the new caching system took way longer than expected to stop being broken, delaying all this. Since that happened about the same time my internship ended, the forward progress that would've been made here while the favicon cache was working its way through review never happened, and so here we are.
Sorry about that. I feel as if I've slightly misled you about this by omitting to explain this fully sooner.

I'll find time this week to implement the needed changes, but they should probably not be landed at the same time as the cruder version, as it's not something I trust myself to be able to get right first time without testing.

In any case, you're right that this patch would benefit from some refactoring to make it easier to work with, and less of a horrid rapidly-constructed hack.

> > That, and the thinking was to have the decoder really just decode the ICO
> > container format, leaving the decoding of the actual image payloads to the
> > caller.
> 
> In that case, we really want the decoder to return (or expose) something
> akin to the ICO directory format itself, right?
> 
> "Here's a sequence of entries referring to this byte array. The first is a
> PNG, 128x128, ...; the second is a ..."

This makes a lots of sense, actually. Particularly since the cache takes an Iterator<Bitmap> for adding collections of images.

> That way we can hide all of this pruning nonsense (if we need the only-one
> hack, then return just one 'fake' entry), and also implement a fairly
> agnostic bitmap decoder shim that consumes those entries and a favicon
> cache, filling it with bitmaps.

So the pruning nonsense is something we can't quite get rid of, annoyingly, and it's mostly Google's fault.

See, the format used for BMP payloads in ICOs isn't standard (Unlike PNG payloads, it's not just a BMP stored at an offset inside the ICO). Sufficiently so that if you cut out the payload and pass it to Android's decoder library it will fail (There's information in the icon directory entry necessary for decoding.)
Writing a system to decode these payloads is something that'd be nice to avoid having to do. Decoding crazy nonstandard formats invented by Microsoft is not something one does if one wants to keep one's sanity.

It turns out that if you present Android's bitmap decoder libary with an ICO containing a BMP payload with exactly one icon directory entry it will successfully decode that one image. If you present the Android decoder with an ICO with more than one icon directory entry, it often (But not quite always) fails to decode anything at all (And when it does it always gets the first one, which is usually the smallest one, and is the reason why so many icons do manage to decode, but look pixelated as heck when they do).
So, perfect, this is all we need to decode an ICO. The idea is to simply rearrange the icon directory to have only the image we care about in it each time we get Android to decode one.

This also seems to be something I've done a very poor job of both explaining and abstracting away from developers who value their sanity. I'll rework this to be less horrible (A nice "decode entry number X" function that hides the madness seems like a good plan).
Subsequently, then, creation of the needed Iterator<Bitmap> for multiple images is trivial.

> > Still, it seems noninsane to prefer not to consume extra memory where
> > available - my worry is that people incorrectly encoding the lower BPP
> > versions isn't an isolated problem, and if it degrades gracefully (Just
> > costing a little extra RAM) if you get too many bits, this is probably okay?
> 
> That's a reasonable concern. If we can't experiment with this, can we make
> it an easily switchable behavior now, while we still all understand how this
> works?

Of course. Make the decoder take the target BPP as an argument and, while decoding, have it aim for the highest available BPP <= the input for each size of image found.
Then we can "Turn it off" by passing Integer.MAX_VALUE.
After bribing a biochemist with biscuits to lend me a phone to test on, I've now implemented proper support for ICOs. No more kludgeful intermediate solution, we're doing this properly.

ICODecoder now provides a method for deleting an entry from an ICO. This method deletes the specified entry from the header, removes its payload from the ICO, moves payloads to the right of the removed payloads into the empty space, and updates the headers to make the offsets remain correct. This will be useful for minifying ICOs.

An ICO is now decoded in roughly two steps:

ICODecoder.decodeIconDirectoryAndPrune
This method decodes the Icon Directory into an array of IconDirectoryEntry objects, decides which entries we want to keep, and deletes all other entries from the underlying buffer.

Currently, it discards all images larger than the smallest available image larger than or equal in size to the density-adjusted maximum favicon size. 
For each size of image for which an image is kept, the image selected is that with the smallest colour depth greater than or equal to the result of GeckoAppShell.getScreenDepth(), or the image with the largest colour depth - whichever has the smallest colour depth.
If the colour depths are not sufficient to differentiate between images to be kept, the image with the largest palette size wins. Failing that, the image occupying the smallest amount of space in the ICO file is chosen.

(Clearly, the density-adjusted maximum favicon size is given by R.dimen.favicon_largest_interesting_size and needs to manually be set to the value in dp of the largest favicon we ever want in our UI, but that's old news.)

Once we've worked out which things we want to keep, everything else is deleted from the input buffer to save space. It is this trimmed buffer which is subsequently stored in the database.

ICODecoder now provides a decodeBitmapAtIndex(i) method which returns the Bitmap represented by the Icon Directory Entry at index i in the *trimmed* ICO. This method implements the required trickery to talk Android's decoder into decoding BMP payloads for us without corrupting the input buffer, and without duplicating the buffer.
The iterator provided by the ICODecoder just returns the values of decodeBimapAtIndex for increasing values of i over the range of available IconDirectoryEntry objects. This is passed to the Favicon cache's method for handling such things.

There remains the question about what to store in the database, but I think what we now have is sensible (As sensible as storing Favicons in the database at all is, anyway.)
Previously, all favicons were re-encoded as PNGs and stored. It's perhaps worth considering if this time/space tradeoff is worthwhile in general, but that's another problem. (I'd suggest doing "If it's bigger than X try re-encoding it, else just store it)
This patch stores all non-ICO favicons the old way (ie. Re-encoded as PNGs) and all favicons verbatim (But stripped of their uninteresting/corrupt components). Since all unusefully enormous favicons are culled on arrival, this protects us from having an enormous favicon dropped on us (And also gives us the fun property of memory requirement going down for devices with crappier screens (And presumably crappier memory)).


Some mucking about took place in LoadFaviconTask, as well, to enable it to support the notions of multiple images being returned. (It now concludes most requests with a call to the cache to get the real result, after the actual load task resulted in the desired images landing in the cache.). Hopefully this makes sense when read.

So actually that turned out to not be nearly as complicated as expected. There's also almost certainly some fun bugs somewhere in that splurge of memcpy madness.

Next step: Unit tests, when time (And biscuit supplies) permit.

> > > That, and the thinking was to have the decoder really just decode the ICO
> > > container format, leaving the decoding of the actual image payloads to the
> > > caller.

This idea was discarded.

> > In that case, we really want the decoder to return (or expose) something
> > akin to the ICO directory format itself, right?

It now does. Sort of.

> > That's a reasonable concern. If we can't experiment with this, can we make
> > it an easily switchable behavior now, while we still all understand how this
> > works?

This current approach for selecting best BPP appears to work well in all cases I can find. (Including the crazy NVidia.com favicon). Yay!
Attachment #833286 - Attachment is obsolete: true
Attachment #833286 - Flags: review?(rnewman)
Attachment #8333625 - Flags: review?(rnewman)
Comment on attachment 8333625 [details] [diff] [review]
V10 - Add support for decoding ICOs, now making use of all constituents.

Review of attachment 8333625 [details] [diff] [review]:
-----------------------------------------------------------------

Half-reviewed. See comments about compaction, mostly.

::: mobile/android/base/db/LocalBrowserDB.java
@@ +734,5 @@
>          if (b == null) {
>              return null;
>          }
>  
> +        return FaviconDecoder.decodeFavicon(b, 0, b.length);

Nit: would be nice to have a version of this method that just takes the byte array.

::: mobile/android/base/favicons/LoadFaviconTask.java
@@ +227,5 @@
> +        }
> +
> +        // This may not be provided, but if it is, it's useful.
> +        final long entityReportedLength = entity.getContentLength();
> +        InputStream contentStream = null;

Move this declaration down to where it's used.

@@ +235,5 @@
> +            // Integer overflow should not be a problem for Favicon sizes...
> +            bufferSize = (int) entityReportedLength+1;
> +        } else {
> +            // No declared size, so guess and reallocate later if it turns out to be too small.
> +            bufferSize = 25000;

What kind of computer scientist are you?!

Power of two, please :P

@@ +267,5 @@
> +                }
> +            }
> +        } finally {
> +            if (contentStream != null) {
> +                contentStream.close();

Rework the assignment to not use this god-awful cargo-culted ancient-Fennec pattern. It hides bugs, spreads out logic, and looks awful.

  final InputStream contentStream = entity.getContent();
  if (contentStream == null) {
      // Oh look! An error case that we don't handle in this patch!
      return ???;
  }
  try {
      int lastRead = 0;
      while (lastRead != -1) {
          …
      }
  } finally {
      contentStream.close();
  }

@@ +350,5 @@
>  
> +        LoadFaviconResult loadedBitmaps = loadFaviconFromDb();
> +        if (loadedBitmaps != null) {
> +            Favicons.putFaviconsInMemCache(mFaviconUrl, loadedBitmaps.getBitmaps());
> +            return Favicons.getSizedFaviconFromCache(mFaviconUrl, mTargetWidth);

These two lines...

@@ +377,5 @@
>  
> +        if (loadedBitmaps != null) {
> +            saveFaviconToDb(loadedBitmaps.getBytesForDatabaseStorage());
> +            Favicons.putFaviconsInMemCache(mFaviconUrl, loadedBitmaps.getBitmaps());
> +            return Favicons.getSizedFaviconFromCache(mFaviconUrl, mTargetWidth);

... and these three lines (which includes those two) are repeated. So let's break those out into a separate method or two, and add a couple of comments -- what happens if loadedBitmaps is empty? What if there's a cache miss?

And do we want to be saving the favicon to the DB *right now*, before we return? Or should we be queuing those up, either as a background runnable (that goes to the end of the line), or explicitly in a queue that can be committed in one go?

Lastly, it would seem that we should fill the cache before we hit the DB, lest we unnecessarily race with a future lookup.

::: mobile/android/base/favicons/decoders/FaviconDecoder.java
@@ +129,5 @@
> +        @Override
> +        public Bitmap next() {
> +            if (mBitmapAvailable) {
> +                mBitmapAvailable = false;
> +                return mBitmap;

You can null mBitmap at this point, and eliminate mBitmapAvailable in favor of a null check. Iterators don't support peeking.

::: mobile/android/base/favicons/decoders/ICODecoder.java
@@ +216,5 @@
> +            }
> +        }
> +
> +        // Abort if we find ourselves deleting every entry (Perhaps they're all corrupt?)
> +        if (mIconDirectory.length == 0) {

You can move this check to around line 207 -- you have entriesToErase.size() and entriesDiscovered.size(), which you use on 209.

@@ +220,5 @@
> +        if (mIconDirectory.length == 0) {
> +            return false;
> +        }
> +
> +        if (entriesToErase.size() != 0) {

Early return.

    mIsValid = true;

    if (entriesToErase.isEmpty()) {
        return true;
    }

    // Otherwise...

@@ +225,5 @@
> +            // Set the number of images field in the buffer to reflect the number of retained entries.
> +            mDecodand[mOffset + 4] = (byte) mIconDirectory.length;
> +            mDecodand[mOffset + 5] = (byte) (mIconDirectory.length >>> 8);
> +
> +            deleteEntries(entriesToErase);

I contend that it's worth figuring out in advance whether we want to do this.

If we have a 20KB favicon, and we're going to remove one erroneous 1KB value, is it worth it?

@@ +238,5 @@
> +     * entries we are not interested in.
> +     *
> +     * @param entriesToErase A List of IconDirectoryEntries to drop from the ICO.
> +     */
> +    private void deleteEntries(List<IconDirectoryEntry> entriesToErase) {

Three points:

1. You're really doing two things here: deleting 'files', and then defragmenting. The comment should reflect this.

2. By using data structures more intelligently, you can make this less work:
 
   * All records at the tail can be trivially truncated.
   * Compute how many headers will be removed (fixed size, all payloads need to move left by this amount).
   * Compact the headers leftward accordingly.
   * Compact the payloads leftwards into the free space. Every payload will need to move by (removed headers * header size), so try to copy each byte only once by starting at the left and moving right!

Start by sorting both the records to keep and the records to discard. If headers don't have to be in the same order as payloads, you'll need to do this twice.

3. You're never truncating mDecodand (you can't), so this does *not* save memory use -- it saves disk space and memory use *in the future*. If most favicons are transient, this is a waste of effort.

You might consider addressing number 3, and also making your life a lot easier, by doing a compacting copy:

* Allocating a new array of the correct size. (You know how big everything is that you're removing, and everything you're keeping.)
* Walk headers, spitting them out into a new byte array.
* Walk bodies, spitting them out into a new byte array.
* Update directory entries to match (trivial, because again you have sorted orders and known dimensions).

Note that there is likely to be a performance improvement by not having to worry about overlapping intervals in arraycopy.

If, however, this is a big buffer and you're removing very little (but why would that be the case?), compacting for storage might be worthwhile. Hard to say.

@@ +241,5 @@
> +     */
> +    private void deleteEntries(List<IconDirectoryEntry> entriesToErase) {
> +        for (IconDirectoryEntry entry : entriesToErase) {
> +            // Offset of the entry itself.
> +            int entryOffset = mOffset + ICO_HEADER_LENGTH_BYTES + (entry.mIndex * ICO_ICONDIRENTRY_LENGTH_BYTES);

int entryOffset = entry.getOffset();

@@ +250,5 @@
> +            // as the spec doesn't require IconDirectoryEntries to be in the order of their payloads.
> +            System.arraycopy(mDecodand, entryOffset + ICO_ICONDIRENTRY_LENGTH_BYTES, mDecodand, entryOffset, lengthFollowingHeaderEntry);
> +            mLen -= ICO_ICONDIRENTRY_LENGTH_BYTES;
> +
> +            if (entry.mIsErroraneous) {

Erroneous?

@@ +251,5 @@
> +            System.arraycopy(mDecodand, entryOffset + ICO_ICONDIRENTRY_LENGTH_BYTES, mDecodand, entryOffset, lengthFollowingHeaderEntry);
> +            mLen -= ICO_ICONDIRENTRY_LENGTH_BYTES;
> +
> +            if (entry.mIsErroraneous) {
> +                // Such entries are considered to have no meaningful payload. Just upate respecting

upate
Attachment #8333625 - Flags: review?(rnewman) → feedback+
(In reply to Richard Newman [:rnewman] from comment #33)
> Comment on attachment 8333625 [details] [diff] [review]
> V10 - Add support for decoding ICOs, now making use of all constituents.
> 
> Review of attachment 8333625 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Half-reviewed. See comments about compaction, mostly.
> 
> ::: mobile/android/base/db/LocalBrowserDB.java
> @@ +734,5 @@
> >          if (b == null) {
> >              return null;
> >          }
> >  
> > +        return FaviconDecoder.decodeFavicon(b, 0, b.length);
> 
> Nit: would be nice to have a version of this method that just takes the byte
> array.
> 
> ::: mobile/android/base/favicons/LoadFaviconTask.java
> @@ +227,5 @@
> > +        }
> > +
> > +        // This may not be provided, but if it is, it's useful.
> > +        final long entityReportedLength = entity.getContentLength();
> > +        InputStream contentStream = null;
> 
> Move this declaration down to where it's used.
> 
> @@ +235,5 @@
> > +            // Integer overflow should not be a problem for Favicon sizes...
> > +            bufferSize = (int) entityReportedLength+1;
> > +        } else {
> > +            // No declared size, so guess and reallocate later if it turns out to be too small.
> > +            bufferSize = 25000;
> 
> What kind of computer scientist are you?!

The sort that turns unreasonably large quantities of biscuits into (Usually) working programs and, occasionally, proofs...
...Or at least noise.

> Power of two, please :P

Oooh, computer science. :P

The amortised cost of repeatedly doubling buffer size will be as small as possible for a power-of-two buffer (By a miniscule amount, in theory, neglecting implementation quirks, terms and conditions apply, batteries not inclued.).
At the kernel level, if the amount you want to allocate happens to be a multiple of the memory allocator's block size then you'll waste very slightly less memory at the end of the buffer, and it's sort of fair to assume getting a power of two number of bytes will ensure this happens (Unless you're dealing with some weird hardware and a weird OS). 
Unfortunately, Java arrays have associated metadata, the format of which is not given by the specification (iirc). Whenever you allocate an array in Java, then, the actual malloc'd amount will be a little larger than what you asked for (Probably! It's not specified - maybe your VM keeps the metadata centrally somewhere or something.). Certainly, the HotSpot VM sticks this in a few bytes at the start of the array. 

As such, it is not possible, in general, to ensure that an array allocated in Java actually ends up occupying a quantity of memory that fits neatly into an integer number of pages. You can look up the implementation of the actual VM you're targeting and tune for that (Setting the array size to desired power of two subtract the header size), but the actual gains are miniscule.  The pointfulness of such non-portable micro-optimisations are debatable.


So, to answer your question again - one who finds the Java language specifcation and the Java virtual machine specification to be suitable bedtime reading. :P


Sure, I'll change it to 32768 if you think it'll make the code prettier - but (Unless I've missed something) you don't get to claim that it makes a meaningful performance difference. (In fact, since the metadata size of an array is liable to be small, a power-of-two array is *more* likely to end up wasting most of a block, as it was only pushed into the last block by those few extra bytes (Unless they, of course, happen to be a multiple of the block size.))

> @@ +267,5 @@
> > +                }
> > +            }
> > +        } finally {
> > +            if (contentStream != null) {
> > +                contentStream.close();
> 
> Rework the assignment to not use this god-awful cargo-culted ancient-Fennec
> pattern. It hides bugs, spreads out logic, and looks awful.

Oh, duh. Sorry.
It's actually not even clear that this call can return null. The docs seem to suggest it cannot. That's what I get for preserving the old logic so dilligently in places. :P

> @@ +350,5 @@
> >  
> > +        LoadFaviconResult loadedBitmaps = loadFaviconFromDb();
> > +        if (loadedBitmaps != null) {
> > +            Favicons.putFaviconsInMemCache(mFaviconUrl, loadedBitmaps.getBitmaps());
> > +            return Favicons.getSizedFaviconFromCache(mFaviconUrl, mTargetWidth);
> 
> These two lines...
> 
> @@ +377,5 @@
> >  
> > +        if (loadedBitmaps != null) {
> > +            saveFaviconToDb(loadedBitmaps.getBytesForDatabaseStorage());
> > +            Favicons.putFaviconsInMemCache(mFaviconUrl, loadedBitmaps.getBitmaps());
> > +            return Favicons.getSizedFaviconFromCache(mFaviconUrl, mTargetWidth);
> 
> ... and these three lines (which includes those two) are repeated. So let's
> break those out into a separate method or two, and add a couple of comments
> -- what happens if loadedBitmaps is empty? What if there's a cache miss?

If there are no bitmaps decoded, the returned LoadFaviconResult is null, so the null check should stop us entering this block.

There *cannot* be a cache miss here. Ostensibly.
This is perhaps questionable design, but it's unclear quite how to make it neater.

A LoadFaviconTask represents a wish to load a favicon for url X at size Y. It returns a single bitmap to that effect.
But the operation of loading a favicon may yeild a collection of bitmaps, of which only one is interesting to this LoadFaviconTask (But other chained tasks may find other results from the collection interesting.). Since the logic for dealing with getting scaled versions of downloaded images to satisfy requests is all present (And irritatingly complicated) in the cache, the approach taken is that downloaded favicons get put into the cache and then the actual request is satisfied with a call to the cache (Which then scales as needed and marks the *actual* favicon we used as most recently used. This has the further handy effect of causing elements loaded from a favicon we downloaded that we have not had a request for to fall down the recently-used list and become more eligible for eviction.).
In the event that a call to extract elements from the cache that happens directly after a call to populate the cache with those elements misses (Which happens only in some really crazy scenarios, like we're dropping caches to recover memory because we're running out), then the LoadFaviconTask will simply return null. The requestor will get the null favicon returned - we'll get a globe icon showing up in the UI. Given the contrived circumtances necessary for this to actually happen, this seems reasonable.

> 
> And do we want to be saving the favicon to the DB *right now*, before we
> return? Or should we be queuing those up, either as a background runnable
> (that goes to the end of the line), or explicitly in a queue that can be
> committed in one go?

This is the way we've always done it before, so this is the way we're going to keep on doing it in a bug unrelated to that problem.

That said, this does seem like something we should change. It makes lots of sense to batch all these writes and push them to disk in one go. The cost of writing to the DB is very overheadful, so batching is liable to produce big wins. But not in this bug!

> Lastly, it would seem that we should fill the cache before we hit the DB,
> lest we unnecessarily race with a future lookup.

... Which prevents us from pushing those two lines into a helper method.

You're right, this would produce a performance gain in obscure situations (We'd avoid spawning another LoadFaviconTask to chain on this one and instead just hit the cache right away).
Worth the extra code untidiness? Questionable. It's not exactly going to happen often. (Or is it?)

> ::: mobile/android/base/favicons/decoders/FaviconDecoder.java
> @@ +129,5 @@
> > +        @Override
> > +        public Bitmap next() {
> > +            if (mBitmapAvailable) {
> > +                mBitmapAvailable = false;
> > +                return mBitmap;
> 
> You can null mBitmap at this point, and eliminate mBitmapAvailable in favor
> of a null check. Iterators don't support peeking.

Also removing remove(), because wat.

> @@ +238,5 @@
> > +     * entries we are not interested in.
> > +     *
> > +     * @param entriesToErase A List of IconDirectoryEntries to drop from the ICO.
> > +     */
> > +    private void deleteEntries(List<IconDirectoryEntry> entriesToErase) {
> 
> Three points:
> 
> 1. You're really doing two things here: deleting 'files', and then
> defragmenting. The comment should reflect this.
> 
> 2. By using data structures more intelligently, you can make this less work:
>  
>    * All records at the tail can be trivially truncated.
>    * Compute how many headers will be removed (fixed size, all payloads need
> to move left by this amount).
>    * Compact the headers leftward accordingly.
>    * Compact the payloads leftwards into the free space. Every payload will
> need to move by (removed headers * header size), so try to copy each byte
> only once by starting at the left and moving right!
> 
> Start by sorting both the records to keep and the records to discard. If
> headers don't have to be in the same order as payloads, you'll need to do
> this twice.

Indeed. I thought about this - I was in a hurry - only had a borrowed phone for one night and wanted something working. Still, that problem seems to be going away (Yaay!), and your suggestions are entirely sensible.

> 
> 3. You're never truncating mDecodand (you can't), so this does *not* save
> memory use -- it saves disk space and memory use *in the future*. If most
> favicons are transient, this is a waste of effort.

This is true. But since every favicon gets written to the db, this does save disk space, as well as memory the next time we load this particular icon from the db.

> You might consider addressing number 3, and also making your life a lot
> easier, by doing a compacting copy:

This actually seems like a good idea anyway. So the plan would essentially be:

Check if buffer has a nontrivial amount of empty space at end or not (If the server gave us the right size value for the file it'll already be exactly right).
Check if we want to remove a meaningful amount of stuff from this favicon.
If either of the above are true, copy to a new buffer what we want to keep.

Also of note - the current approach will *keep* rubbish in the file. A compacting copy will drop any nonsense data between the payloads. (So if a header is corrupt but still has a large associated payload in the file that we can't find then we probably shouldn't keep it..)
So. Yes. Compacting copy FTW. It's also a load simpler. Remind me never to write a filesystem.

> @@ +225,5 @@
> > +            // Set the number of images field in the buffer to reflect the number of retained entries.
> > +            mDecodand[mOffset + 4] = (byte) mIconDirectory.length;
> > +            mDecodand[mOffset + 5] = (byte) (mIconDirectory.length >>> 8);
> > +
> > +            deleteEntries(entriesToErase);
> 
> I contend that it's worth figuring out in advance whether we want to do this.
> 
> If we have a 20KB favicon, and we're going to remove one erroneous 1KB
> value, is it worth it?

If we're doing a copy anyway (To compact the buffer, as we found that there was loads of extra space) then we might as well strip out every last tiny thing we can (If it's not much extra work).
If we're not already wanting to copy, then it makes sense to keep little bits of rubbish. Seems sort of sane.

> @@ +250,5 @@
> > +            // as the spec doesn't require IconDirectoryEntries to be in the order of their payloads.
> > +            System.arraycopy(mDecodand, entryOffset + ICO_ICONDIRENTRY_LENGTH_BYTES, mDecodand, entryOffset, lengthFollowingHeaderEntry);
> > +            mLen -= ICO_ICONDIRENTRY_LENGTH_BYTES;
> > +
> > +            if (entry.mIsErroraneous) {
> 
> Erroneous?

Fascinating. This one is such a common mistake Google doesn't "Did you mean..." you to the right one...
(In reply to Chris Kitching [:ckitching] from comment #34)


> So, to answer your question again - one who finds the Java language
> specifcation and the Java virtual machine specification to be suitable
> bedtime reading. :P

You have out-nerded me :D

Much of this is simply cultural convention -- to most of us, seeing a non-power-of-two array size is kinda weird-feeling. But if you have a good justification, that's all I need!


> If there are no bitmaps decoded...
> There *cannot* be a cache miss here...

Perfect for a brief comment.


> This is the way we've always done it before, so this is the way we're going
> to keep on doing it in a bug unrelated to that problem.

File a follow-up on the next-level icon shenanigans bug?


> That said, this does seem like something we should change. It makes lots of
> sense to batch all these writes and push them to disk in one go. The cost of
> writing to the DB is very overheadful, so batching is liable to produce big
> wins. But not in this bug!

Take me down to the follow-up city / where the bugs are well-scoped and the code lands land promptly


> You're right, this would produce a performance gain in obscure situations
> (We'd avoid spawning another LoadFaviconTask to chain on this one and
> instead just hit the cache right away).
> Worth the extra code untidiness? Questionable. It's not exactly going to
> happen often. (Or is it?)

Depends how long that DB write takes, doesn't it?


> So. Yes. Compacting copy FTW. It's also a load simpler. Remind me never to
> write a filesystem.

On the contrary, your approach is how a disk defragmenter works, because definitionally you're working within a fixed-size extent.
See Also: → 957400
So, finally got back to this...

Compacting copy FTW.

I've added a tuning parameter, ICODecoder.COMPACT_THRESHOLD. If compacting can save more than that many bytes, a compacting copy is performed. If not, we store the bytes verbatim in the database. I've semi-arbitrarily set it to 4KB for now.
Presumably the followup which does the async favicons-to-database flushing will want to think about that value.

On the topic of the compacting-copy... We could do it without reallocating the buffer if you'll r+ a use of sun.misc.Unsafe... :P (Allows us to truncate buffers without reallocation, among other things.).

Another thing that might be worth considering, but which I doubt is worth the CPU time, would be to re-encode all images to ensure they're all nice, efficiently-encoded PNGs. I suspect the usually-small space-saving this would produce wouldn't be worth the largeish cost of that calculation, and there's no easy way to figure out how much smaller it'll get under re-encoding.

(Also addressed your slew of other, less-interesting complaints).

(In reply to Richard Newman [:rnewman] from comment #35)
> > This is the way we've always done it before, so this is the way we're going
> > to keep on doing it in a bug unrelated to that problem.
> 
> File a follow-up on the next-level icon shenanigans bug?

Bug 957405

> Take me down to the follow-up city / where the bugs are well-scoped and the
> code lands land promptly

Good luck with that.

On a related note - things I ran into while solving this:

Bug 957400
Bug 957404
Attachment #8333625 - Attachment is obsolete: true
Attachment #8356903 - Flags: review?(rnewman)
... And now without large chunks of debugging code, too...
Attachment #8356903 - Attachment is obsolete: true
Attachment #8356903 - Flags: review?(rnewman)
Attachment #8356906 - Flags: review?(rnewman)
Comment on attachment 8356906 [details] [diff] [review]
V12 - Add support for decoding ICOs, now with 74% less insanity.

Review of attachment 8356906 [details] [diff] [review]:
-----------------------------------------------------------------

I would love to see some tests for this.

::: mobile/android/base/favicons/LoadFaviconTask.java
@@ +204,5 @@
> +    }
> +
> +    /**
> +     * Download the Favicon bitmap from the given URL and pass it to the decoder function. Return
> +     * the resulting bitmap to the caller.

Comment incorrect.

@@ +207,5 @@
> +     * Download the Favicon bitmap from the given URL and pass it to the decoder function. Return
> +     * the resulting bitmap to the caller.
> +     *
> +     * @param targetFaviconURL URL of the favicon to download.
> +     * @return The decoded favicon, or null if no or corrupt data ware received.

Also needs tweaking.

@@ +230,5 @@
> +        int bufferSize;
> +        if (entityReportedLength > 0) {
> +            // The size was reported and sane, so let's use that.
> +            // Integer overflow should not be a problem for Favicon sizes...
> +            bufferSize = (int) entityReportedLength+1;

Spaces around arithmetic.

@@ +233,5 @@
> +            // Integer overflow should not be a problem for Favicon sizes...
> +            bufferSize = (int) entityReportedLength+1;
> +        } else {
> +            // No declared size, so guess and reallocate later if it turns out to be too small.
> +            bufferSize = 25000;

DEFAULT_FAVICON_BUFFER_SIZE

@@ +239,5 @@
> +
> +        // Allocate a buffer to hold the raw favicon data downloaded.
> +        byte[] buffer = new byte[bufferSize];
> +
> +        // A pointer to the empty space in the buffer with smallest index.

"The offset of the start of the buffer's free space".

@@ +250,5 @@
> +            // Fully read the entity into the buffer - decoding of streams is not supported
> +            // (and questionably pointful - what would one do with a half-decoded Favicon?)
> +            while (lastRead != -1) {
> +                // Read as many bytes as are currently available into the buffer.
> +                lastRead = contentStream.read(buffer, bPointer, buffer.length-bPointer);

Spaces around arithmetic.

::: mobile/android/base/favicons/decoders/FaviconDecoder.java
@@ +83,5 @@
> +     */
> +    public static LoadFaviconResult decodeFavicon(byte[] buffer, int offset, int length) {
> +        LoadFaviconResult result;
> +        if (isDecodableByAndroid(buffer, offset)) {
> +            result  = new LoadFaviconResult();

Double space.

@@ +87,5 @@
> +            result  = new LoadFaviconResult();
> +            result.mOffset = offset;
> +            result.mLength = length;
> +            result.mHasMultipleBitmaps = false;
> +            result.mBitmapsDecoded = new SingleBitmapIterator(BitmapUtils.decodeByteArray(buffer, offset, length));

We assume here that decodeByteArray doesn't hold on to the entire supplied buffer -- worst case, each of our buffers will be twice the necessary size.

@@ +93,5 @@
> +
> +            return result;
> +        }
> +
> +        IconDirectoryEntry.sMaxBPP = GeckoAppShell.getScreenDepth();

Let's remove the dependency on GeckoAppShell, and have this be a static method that a caller (GAS, in this case) can call: setMaxBitsPerPixel.

@@ +104,5 @@
> +        if (result == null) {
> +            return null;
> +        }
> +
> +        result.mHasMultipleBitmaps = decoder.getImageCount() > 1;

Why doesn't the decoder set that field?

@@ +108,5 @@
> +        result.mHasMultipleBitmaps = decoder.getImageCount() > 1;
> +
> +        return result;
> +    }
> +    public static LoadFaviconResult decodeFavicon(byte[] buffer) {

Newline after }.

::: mobile/android/base/favicons/decoders/ICODecoder.java
@@ +167,5 @@
> +                // If we already have a smaller image larger than the maximum size of interest, we
> +                // don't care about the new one which is larger than the largest image larger than
> +                // the maximum size.
> +                if (newEntry.mWidth <= minimumMaximum) {
> +                    continue;

The comment and this conditional don't agree.

The code says: if this too-large entry is smaller than a previous oversized image, throw it away. That is, keep one oversized image, and make it the biggest one.

The comment says: if we have a smaller oversized image, keep it. (But the comment is also misphrased in the second clause -- s/largest/smallest.)

::: mobile/android/base/favicons/decoders/IconDirectoryEntry.java
@@ +1,1 @@
> +package org.mozilla.gecko.favicons.decoders;

License.

@@ +32,5 @@
> +     * Method to get a dummy Icon Directory Entry with the Erroneous bit set.
> +     *
> +     * @return An erroneous placeholder Icon Directory Entry.
> +     */
> +    public static IconDirectoryEntry getErroraneous() {

getErroneousEntry

::: mobile/android/base/favicons/decoders/LoadFaviconResult.java
@@ +1,1 @@
> +package org.mozilla.gecko.favicons.decoders;

License.

@@ +56,5 @@
> +                Log.w(LOGTAG, "Favicon compression failed.");
> +            }
> +
> +            // Restore the iterator to its original state. (Ugh.).
> +            mBitmapsDecoded = new FaviconDecoder.SingleBitmapIterator(favicon);

Can't you peek at the iterator?
Attachment #8356906 - Flags: review?(rnewman) → review+
(In reply to Richard Newman [:rnewman] from comment #39)
> I would love to see some tests for this.

It would be excellent. I'll file a followup and address it at some point before the thermal death of the universe.
(Seriously, if you want those done in the next two months I'm probably not your guy. Some favicons that I know have amusing propeties off the top of my head:
NVIDIA.com has an ICO containing three different colour depths and three different sizes for each - a most nifty property for unit testing. These are bitmap payloads.
Microsoft.com has an ICO containing six sizes, going up as far as 128x128 - probably handy for verifying the "Bin sizes we don't care about" logic works. It also (IIRC) claims that every one of these has a colour depth of zero... (Also BMP payloads)
Golem.de has an ICO containing a really odd selection of 5 sizes - four small ones and one gigantic 256x256 one (The largest possible size). These are, I think, PNG payloads.
Those ones will be handy for a start. Tactfully maiming a few of them to be corrupt or otherwise and, well, you know the drill. Please make the random-data test be deterministic (ie. No random test inputs). I'll stop stating the obvious now.)


Am I clear to land this one before the tests? I realise this is *terrible* practice, but any ICO-decoding is better than no ICO-decoding - unless a regression-party occurs the result will, most likely, be at least be almost, but not quite, entirely unlike failure.

> @@ +250,5 @@
> > +            // Fully read the entity into the buffer - decoding of streams is not supported
> > +            // (and questionably pointful - what would one do with a half-decoded Favicon?)
> > +            while (lastRead != -1) {
> > +                // Read as many bytes as are currently available into the buffer.
> > +                lastRead = contentStream.read(buffer, bPointer, buffer.length-bPointer);
> 
> Spaces around arithmetic.

Why so no automated linting tool? (Actually, *I* should be using one locally, anyway)

> @@ +87,5 @@
> > +            result  = new LoadFaviconResult();
> > +            result.mOffset = offset;
> > +            result.mLength = length;
> > +            result.mHasMultipleBitmaps = false;
> > +            result.mBitmapsDecoded = new SingleBitmapIterator(BitmapUtils.decodeByteArray(buffer, offset, length));
> 
> We assume here that decodeByteArray doesn't hold on to the entire supplied
> buffer -- worst case, each of our buffers will be twice the necessary size.

True.

> 
> @@ +93,5 @@
> > +
> > +            return result;
> > +        }
> > +
> > +        IconDirectoryEntry.sMaxBPP = GeckoAppShell.getScreenDepth();
> 
> Let's remove the dependency on GeckoAppShell, and have this be a static
> method that a caller (GAS, in this case) can call: setMaxBitsPerPixel.

... And also we then stop brainlessly calling it every time we decode something...

My placement of the call site should perhaps be checked for sanity.

> @@ +104,5 @@
> > +        if (result == null) {
> > +            return null;
> > +        }
> > +
> > +        result.mHasMultipleBitmaps = decoder.getImageCount() > 1;
> 
> Why doesn't the decoder set that field?

Because E_INSUFFICIENT_CAFFIENE.

> @@ +108,5 @@
> > +        result.mHasMultipleBitmaps = decoder.getImageCount() > 1;
> > +
> > +        return result;
> > +    }
> > +    public static LoadFaviconResult decodeFavicon(byte[] buffer) {
> 
> Newline after }.

Sorry. Habit of mine to not do that when the second method is a default-argument-setting override of the method above.

> 
> ::: mobile/android/base/favicons/decoders/ICODecoder.java
> @@ +167,5 @@
> > +                // If we already have a smaller image larger than the maximum size of interest, we
> > +                // don't care about the new one which is larger than the largest image larger than
> > +                // the maximum size.
> > +                if (newEntry.mWidth <= minimumMaximum) {
> > +                    continue;
> 
> The comment and this conditional don't agree.
> 
> The code says: if this too-large entry is smaller than a previous oversized
> image, throw it away. That is, keep one oversized image, and make it the
> biggest one.
> 
> The comment says: if we have a smaller oversized image, keep it. (But the
> comment is also misphrased in the second clause -- s/largest/smallest.)

This looks like an actual bug. Well spotted.

What we want is to keep the smallest image larger than the limit. That way, if an oversized image exists, we certainly have a nice big one to scale down to satisfy requests up near the maximum desired favicon size, but it's the minimal such one (No point wasting memory on a bigger one bigger than the biggest you'll ever display when there's a smaller one you can have.) (And, indeed, the behaviour of the cache is such that, really, you won't have the oversized one in memory for that long if it's larger than MAX_INTERESTING_FAVICON_SIZE, either.)

So. Yes. I've written my comment correctly(ish) and my conditional perfectly backwards. Fail.

> ::: mobile/android/base/favicons/decoders/IconDirectoryEntry.java
> @@ +1,1 @@
> > +package org.mozilla.gecko.favicons.decoders;
> 
> License.

Couldn't some lawyer somewhere have just stipulated "This applies to all files in this directory" and saved us all some bother? :P

> 
> @@ +32,5 @@
> > +     * Method to get a dummy Icon Directory Entry with the Erroneous bit set.
> > +     *
> > +     * @return An erroneous placeholder Icon Directory Entry.
> > +     */
> > +    public static IconDirectoryEntry getErroraneous() {
> 
> getErroneousEntry

Me fail English? That's unpossible!

> @@ +56,5 @@
> > +                Log.w(LOGTAG, "Favicon compression failed.");
> > +            }
> > +
> > +            // Restore the iterator to its original state. (Ugh.).
> > +            mBitmapsDecoded = new FaviconDecoder.SingleBitmapIterator(favicon);
> 
> Can't you peek at the iterator?

The semantics of an Iterator are such that calling next() repeatedly shall give different results.

I guess I could turn this into a sort of bastardised iterator that has an extra peek method - useful in cases where you actually know you've got a SingleBitmapIterator (While keeping the implementation of the standard Iterator interface unchanged, allowing you to handily process arbitrary results with the same code.).
Yes. Let's do that.
Attachment #8356906 - Attachment is obsolete: true
(In reply to Chris Kitching [:ckitching] from comment #40)

> It would be excellent. I'll file a followup and address it at some point
> before the thermal death of the universe.

:P

> (Seriously, if you want those done in the next two months I'm probably not
> your guy. Some favicons that I know have amusing propeties off the top of my
> head:

Thanks for these. Great fodder.

Who knows: maybe I'll write these on a plane.

> Am I clear to land this one before the tests?

Yes.

> Why so no automated linting tool? (Actually, *I* should be using one
> locally, anyway)

Plans are afoot.

> ... And also we then stop brainlessly calling it every time we decode
> something...

Fortunately it's cached in GAS, so the perf wouldn't be a huge issue.

> My placement of the call site should perhaps be checked for sanity.
> Sorry. Habit of mine to not do that when the second method is a
> default-argument-setting override of the method above.

Makes sense. Continue if you wish.

> This looks like an actual bug. Well spotted.

Almost like reviewing code has a purpose! :D

> > getErroneousEntry
> 
> Me fail English? That's unpossible!

We do need to find a concept for which "erroraneous" is the best word. Too good to let it slip away.

> Yes. Let's do that.

Interestingly, Guava has a PeekingIterator class.
Blocks: 961335
(In reply to Richard Newman [:rnewman] from comment #41)
> > It would be excellent. I'll file a followup and address it at some point
> > before the thermal death of the universe.

Bug 961335.

> We do need to find a concept for which "erroraneous" is the best word. Too
> good to let it slip away.

I fear the result would be decidedly uncouth.

> Interestingly, Guava has a PeekingIterator class.

And lots of other shiny toys that we can plausibly get away with using now Proguard will bin the bits we don't want...

Might even be worth having a meta-bug for that. "Stuff we can now bin and replace with standard libraries now size isn't so much of a worry".

I shall shortly land this shizzle. Only four months later than planned. :P
Flags: in-moztrap?
Hardware: ARM → All
Target Milestone: --- → Firefox 29
We won't be adding manually ran ICODecoder tests; bug 961335 is a better solution.
Flags: in-moztrap? → in-moztrap-
Blocks: 921433
https://hg.mozilla.org/mozilla-central/rev/1c402a47da51
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Depends on: 961498
Depends on: 961499
Depends on: 961538
Blocks: 961600
I love those sharp favicons! Thanks for your continued work on this, Chris!
(... and to not just spam here, I'm noming this to be mentioned in the rel-notes)
relnote-firefox: --- → ?
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.