Last Comment Bug 715308 - Decode ::Draw()'n images before other images
: Decode ::Draw()'n images before other images
Status: RESOLVED FIXED
[snappy]
:
Product: Core
Classification: Components
Component: ImageLib (show other bugs)
: unspecified
: All All
: -- normal with 9 votes (vote)
: mozilla13
Assigned To: Justin Lebar (not reading bugmail)
:
Mentors:
: 671641 674547 689411 (view as bug list)
Depends on: 711269 715405 716523 721589 721917 723053 787766
Blocks: 676270 image-suck 716113 702579 716103 716140 726004
  Show dependency treegraph
 
Reported: 2012-01-04 13:09 PST by Justin Lebar (not reading bugmail)
Modified: 2012-09-02 00:29 PDT (History)
24 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Patch v1 (29.54 KB, patch)
2012-01-06 13:51 PST, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Patch v1.1 (rebased to tip) (29.58 KB, patch)
2012-01-06 15:35 PST, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Patch v1.2 (rebase properly) (29.58 KB, patch)
2012-01-06 16:06 PST, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Part 2 (straw man?) - Decode all available bytes on call to SourceDataComplete() (6.41 KB, patch)
2012-01-10 17:03 PST, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Part 1, v2 (28.23 KB, patch)
2012-01-10 17:04 PST, Justin Lebar (not reading bugmail)
rclickenbrock: feedback+
Details | Diff | Splinter Review
Part 1, v3: Decode RasterImage:Draw()'n images before other images. (28.76 KB, patch)
2012-01-12 11:21 PST, Justin Lebar (not reading bugmail)
joe: review-
Details | Diff | Splinter Review
Part 2, v1: Sync-decode the beginning of the image in RasterImage::SourceDataComplete() (7.77 KB, patch)
2012-01-12 11:22 PST, Justin Lebar (not reading bugmail)
joe: review+
Details | Diff | Splinter Review
Patch v4 (30.11 KB, patch)
2012-01-19 14:44 PST, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Patch v5 (30.52 KB, patch)
2012-01-23 08:45 PST, Justin Lebar (not reading bugmail)
justin.lebar+bug: review-
Details | Diff | Splinter Review
Part 1, v6 (30.41 KB, patch)
2012-01-23 17:57 PST, Justin Lebar (not reading bugmail)
joe: review+
Details | Diff | Splinter Review
Part 0, v1: onload/onerror tests (69.08 KB, patch)
2012-01-25 10:55 PST, Justin Lebar (not reading bugmail)
joe: review+
Details | Diff | Splinter Review
Part 2: Add <object> fallback test. (67.57 KB, patch)
2012-01-25 12:04 PST, Justin Lebar (not reading bugmail)
joe: review+
Details | Diff | Splinter Review
Interdiff part 1 v6 --> v7 (8.68 KB, patch)
2012-01-26 09:10 PST, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Part 1 v7 (31.10 KB, patch)
2012-01-26 09:11 PST, Justin Lebar (not reading bugmail)
joe: review+
Details | Diff | Splinter Review
Back out this and bug 721510 (142 bytes, text/plain)
2012-02-01 11:11 PST, Justin Lebar (not reading bugmail)
joe: review+
akeybl: approval‑mozilla‑aurora+
Details
Part 2: Fix crash, randomorange (3.18 KB, patch)
2012-02-08 22:11 PST, Justin Lebar (not reading bugmail)
joe: review+
Details | Diff | Splinter Review

Description Justin Lebar (not reading bugmail) 2012-01-04 13:09:55 PST
RasterImage::Draw() is called when an image is drawn on the screen, even if the image is not yet decoded.

So I think we can and should prioritize the decoding of Draw()'n images.  This should improve our "decode many images" story greatly.
Comment 1 Robert Lickenbrock [:rclick] 2012-01-04 14:04:35 PST
Bug 674547 (which I hoping to finish Real Soon Now) should be of help since the decode worker will have a queue of images that are waiting to decode. My local WIP is round-robin (i.e. do some decoding then punt to the end of the queue if it doesn't finish) to match the current behavior, but changing it to be sorted by some sort of priority in this bug should fairly straightforward, I think.
Comment 2 Justin Lebar (not reading bugmail) 2012-01-04 14:16:35 PST
Yes, bug 674547 should block this bug.  I'd forgotten about that one (even though I filed it!), and I was planning to do that work as part of this bug.

I don't want to steal bug 674547 away from you, but we really want this one, and the sooner the better, because we've had problems with regressions in the past.  Maybe you could post your incomplete patches there?
Comment 3 Justin Lebar (not reading bugmail) 2012-01-05 10:17:35 PST
Adding [snappy]: When switching tabs to an image-heavy page with discarded images, we should decode the images which appear on the screen before the images offscreen.  This will make decoding seem much faster.
Comment 4 Robert Lickenbrock [:rclick] 2012-01-06 11:34:33 PST
I don't have the patches on hand, unfortunately (they're on a SATA drive, but the older P4 system I'm using in the interim can only use IDE drives). However, I got the build environment up and running and I don't think it would take too long to duplicate the work I had. I'll see what I can get done over the weekend.

If it looks like I won't finish in a timely manor, I'll gladly hand off bug 674547. I'd hate for this to miss the Fx12 train due to being blocked on me and my personal issues.
Comment 5 Justin Lebar (not reading bugmail) 2012-01-06 12:13:42 PST
> I'll see what I can get done over the weekend.

Don't worry about it; I have a patch for this that's 95% done.

I'd definitely like your feedback on the patch, if you're able to have a look!  I'll mark it when I post the patch.
Comment 6 Justin Lebar (not reading bugmail) 2012-01-06 13:51:40 PST
Created attachment 586557 [details] [diff] [review]
Patch v1
Comment 7 Justin Lebar (not reading bugmail) 2012-01-06 13:52:20 PST
Comment on attachment 586557 [details] [diff] [review]
Patch v1

Note the dependency on bug 715405, which adds the linked list class.
Comment 8 Justin Lebar (not reading bugmail) 2012-01-06 15:35:54 PST
Created attachment 586602 [details] [diff] [review]
Patch v1.1 (rebased to tip)
Comment 9 Justin Lebar (not reading bugmail) 2012-01-06 16:06:09 PST
Created attachment 586607 [details] [diff] [review]
Patch v1.2 (rebase properly)
Comment 10 Justin Lebar (not reading bugmail) 2012-01-06 16:06:51 PST
Comment on attachment 586602 [details] [diff] [review]
Patch v1.1 (rebased to tip)

Sorry for the churn; still working out the kinks in this git workflow I'm trying out.
Comment 11 Justin Lebar (not reading bugmail) 2012-01-09 05:54:58 PST
This is leaking on try because it's calling ClearOnShutdown() after XPCOM shutdown.

Why on earth someone is decoding an image after XPCOM shutdown is beyond me...
Comment 12 Kyle Huey [:khuey] (khuey@mozilla.com) 2012-01-09 06:00:37 PST
Got a stack?
Comment 13 Justin Lebar (not reading bugmail) 2012-01-09 06:49:39 PST
Well, now that I think about it, we don't need to ClearOnShutdown() the decode worker.  Weak references should do the trick.  (Unless trace-malloc will complain that I'm leaking a weakref, in which case I may throttle it.)
Comment 14 Justin Lebar (not reading bugmail) 2012-01-09 16:22:14 PST
Okay, fixed the leak.  Still some test failures.

https://tbpl.mozilla.org/?tree=Try&rev=eb29adb5609e
Comment 15 Justin Lebar (not reading bugmail) 2012-01-10 17:03:39 PST
Created attachment 587543 [details] [diff] [review]
Part 2 (straw man?) - Decode all available bytes on call to SourceDataComplete()
Comment 16 Justin Lebar (not reading bugmail) 2012-01-10 17:04:44 PST
Created attachment 587544 [details] [diff] [review]
Part 1, v2
Comment 17 Robert Lickenbrock [:rclick] 2012-01-10 20:28:35 PST
Comment on attachment 587544 [details] [diff] [review]
Part 1, v2

I think this would be cleaner if the code that does the actual decoding was factored out to a method on the RasterImage (RasterImage::Decode(), perhaps). That way in cases where immediate decoding is needed, we won't have to go through the DecodeWorker.

>   nsresult decoderStatus = decoder->GetDecoderError();
>   if (NS_FAILED(decoderStatus)) {
>     DoError();
>     return decoderStatus;
>   }
> 
>-  // Kill off the worker
>-  mWorker = nsnull;
>-  mWorkerPending = false;
>+  // Kill off our decode request, if it's pending.  (If not, this call is
>+  // harmless.)
>+  DecodeWorker::Singleton()->StopDecoding(this);

It looks like StopDecoding() won't get called if finalizing the decoder triggers an error as we'll return before we make it here. This should probably happen before the decoder is finalized.

>   if (!mDecoded && mHasSourceData) {
>       mDrawStartTime = TimeStamp::Now();
>+
>+      // We're drawing this image, so indicate that we should decode it as soon
>+      // as possible.
>+      DecodeWorker::Singleton()->MarkAsASAP(this);
>   }

The mHasSourceData condition will prevent prioritization of images that are mid-download. Is that what we actually want? It seems to me like we'd still want to prioritize the decoding even if we don't have all the data yet.

>+  // If we aren't yet finished decoding and we have more data in hand, re-post
>+  // ourselves to the decode worker.
>+  if (aImg->mDecoder && !aImg->IsDecodeFinished() && haveMoreData) {
>+    RequestDecode(aImg);
>+  }
nit: The !IsDecodeFinished() condition can be dropped. In the case where the decode has finished, we won't have a decoder because it would have been shutdown in the code just before this.

Aside from that the patch looks good to me. I don't appear to be able to set flags, but feedback+ from me.
Comment 18 Justin Lebar (not reading bugmail) 2012-01-10 21:45:59 PST
Comment on attachment 587543 [details] [diff] [review]
Part 2 (straw man?) - Decode all available bytes on call to SourceDataComplete()

I need to do some testing to determine how bad this is.

The right solution might involve bug 505385, which would let us separate out the network load-done event from the decoder's decode-done event (and would, in general, untangle the mess that is these events).  But that looks more complicated than I'd like to take on.

I guess I could replicate what the code currently does -- frob the decoder and let it run for however long it wants to run.  Joe, do you think that's a better solution than decoding all available data in SourceDataComplete()?
Comment 19 Justin Lebar (not reading bugmail) 2012-01-10 21:47:12 PST
Thanks for the feedback, Robert!  I'll have a closer look in the morning.

(You should have editbugs privilege now, by the way.  Let me know if you still can't set f+ and so on.)
Comment 20 Justin Lebar (not reading bugmail) 2012-01-10 21:54:54 PST
Comment on attachment 587543 [details] [diff] [review]
Part 2 (straw man?) - Decode all available bytes on call to SourceDataComplete()

This looks good on try: https://tbpl.mozilla.org/?tree=Try&rev=35ac1b918eb5
Comment 21 Joe Drew (not getting mail) 2012-01-11 10:14:26 PST
(In reply to Justin Lebar [:jlebar] from comment #18)
> I guess I could replicate what the code currently does -- frob the decoder
> and let it run for however long it wants to run.  Joe, do you think that's a
> better solution than decoding all available data in SourceDataComplete()?

I think I need more specific verbs than "frob" to evaluate the solution :)
Comment 22 Justin Lebar (not reading bugmail) 2012-01-11 10:57:06 PST
Heh, okay.  :)  So the three options are:

1) What we currently do: On SourceDataComplete(), synchronously invoke the decoder and let it run for one timeslice, where a timeslice is defined as however long the decoder decides to run for before it yields.

2) On SourceDataComplete(), synchronously invoke the decoder and wait for it to finish decoding the whole image.

3) On SourceDataComplete(), asynchronously invoke the decoder.  Then we need some way to call back into the imgRequest and tell it when the decode has finished.

There are tradeoffs associated with each of these options:

(1) This is semantically incorrect -- we may fire onload before an image is decoded, and we may not notice when an image has an error.  It also may defeat some of what we're trying to accomplish in this bug, in that we want to avoid each image being responsible for its own decoding, so that N images don't hog N timeslices in the event loop.

(2) This may block the event loop for a long time if there's a lot to decode; it's no better and may be much worse than (1) in this respect.

(3) This is probably the right approach, but it's complicated, particularly because the decoder notifications are a giant mess (bug 505385).

I pushed a patch to try with approach (3), but it's a total hack: I add a new event, OnDecoderShutdown to a new interface, imgDecoderObserver2.  When imgRequest calls RasterImage::SourceDataComplete(), it passes in a pointer to itself.  The RasterImage keeps this pointer around, and, in ShutdownDecoder(), calls OnDecoderShutdown(rv).

I feel bad adding more noodles to this spaghetti code (option 3).  But I also don't like writing semantically-incorrect code (option 1).
Comment 23 Joe Drew (not getting mail) 2012-01-11 11:56:33 PST
Option 1 sucks, but it (importantly) doesn't suck more than we currently suck. So it's my second choice.

Option 2 is right out because of the event loop blocking.

Option 3 is what I prefer, and here's how I'd go about doing it:

1) Make imgIDecoderObserver::onStopDecode actually signal the end of decoding. This is likely to be difficult/messy, but is the right long-term solution. See bug 505385 and bug 435296, and the brain of Bobby Holley.

2) Inside imgStatusTracker, wait for *both* onStopDecode and onStopRequest to fire either of them to an observer. That way, one or the other can happen first, but we always send them out in the correct order to observers.
Comment 24 Bobby Holley (:bholley) (busy with Stylo) 2012-01-11 12:16:00 PST
(In reply to Justin Lebar [:jlebar] from comment #22)

> (1) This is semantically incorrect -- we may fire onload before an image is
> decoded, and we may not notice when an image has an error.

One would think so, but this isn't how the de-facto standard plays out. I was worried about this back when I was doing decode-on-draw, because there's really no way to tell if an image has errors without decoding it, but we need to fire either |onload| or |onerror| at _load_ time, not at decode time. But I investigated it, and it turned out that browsers only fire |onerror| if the network load fails, or if the mimetype sniff fails (ie, the first few bytes are garbage). See bug 435296 comment 27.

So firing onload before an image is decoded isn't a big deal in itself. However, bz thought that any image that has _already_ started to decode should be allowed to finish before onload is fired. This is where the OnStartDecode->OnStopContainer onload blocking came from.
Comment 25 Justin Lebar (not reading bugmail) 2012-01-11 14:26:49 PST
Decoding just 64 bytes on SourceDataComplete(), only if we've previously decoded 0 bytes, seems to work.  That would be awesome...
Comment 26 Mozilla RelEng Bot 2012-01-11 16:30:42 PST
Try run for 0375f9ea82c5 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=0375f9ea82c5
Results (out of 218 total builds):
    exception: 69
    success: 86
    warnings: 27
    failure: 36
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-0375f9ea82c5
Comment 27 Justin Lebar (not reading bugmail) 2012-01-11 18:28:33 PST
Comment 26 is tbplbot being pretty slow.  The decode-64-bytes patch is https://tbpl.mozilla.org/?tree=Try&rev=f852066a6eae, which has some orange, but not as much.
Comment 28 Mozilla RelEng Bot 2012-01-11 22:01:22 PST
Try run for f852066a6eae is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=f852066a6eae
Results (out of 272 total builds):
    exception: 2
    success: 233
    warnings: 33
    failure: 4
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-f852066a6eae
 Timed out after 06 hours without completing.
Comment 29 Justin Lebar (not reading bugmail) 2012-01-12 06:58:35 PST
*** Bug 689411 has been marked as a duplicate of this bug. ***
Comment 30 Justin Lebar (not reading bugmail) 2012-01-12 08:32:37 PST
I think I got this.  https://tbpl.mozilla.org/?tree=Try&rev=e28f6812d616
Comment 31 Justin Lebar (not reading bugmail) 2012-01-12 11:21:18 PST
Created attachment 588116 [details] [diff] [review]
Part 1, v3: Decode RasterImage:Draw()'n images before other images.

Review: :JOEDREW!
Comment 32 Justin Lebar (not reading bugmail) 2012-01-12 11:22:12 PST
Created attachment 588117 [details] [diff] [review]
Part 2, v1: Sync-decode the beginning of the image in RasterImage::SourceDataComplete()

Review: :JOEDREW!
Comment 33 Mozilla RelEng Bot 2012-01-12 13:31:05 PST
Try run for e28f6812d616 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=e28f6812d616
Results (out of 273 total builds):
    exception: 1
    success: 236
    warnings: 25
    failure: 11
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-e28f6812d616
Comment 34 Justin Lebar (not reading bugmail) 2012-01-12 13:36:49 PST
That line of red TP4 runs is fishy.  It looks unrelated, but I'll dig in.
Comment 35 Joe Drew (not getting mail) 2012-01-12 13:43:32 PST
Comment on attachment 588116 [details] [diff] [review]
Part 1, v3: Decode RasterImage:Draw()'n images before other images.

Review of attachment 588116 [details] [diff] [review]:
-----------------------------------------------------------------

The biggest problems I have with this patch is that 1) ASAP requests can starve normal requests, and 2) AFAICT we don't round-robin through requests, so a long-decoding image can stop other images from getting any work done at all, which is a functional change from before.

I think if we just toss the image to the end of the linked list after work has been done on it, that would satisfy me for #2.

::: image/src/RasterImage.cpp
@@ +58,5 @@
>  #include "ImageLogging.h"
> +#include "mozilla/TimeStamp.h"
> +#include "mozilla/Telemetry.h"
> +#include "mozilla/Preferences.h"
> +#include "mozilla/ClearOnShutdown.h"

All four of these are included again below. You should just remove these additions.

@@ +231,5 @@
> +
> +    DecodeWorker()
> +      : mPendingInEventLoop(false)
> +    {
> +    }

{}

@@ +2543,5 @@
>    // If we get this far, dispatch the worker. We do this instead of starting
>    // any immediate decoding to guarantee that all our decode notifications are
>    // dispatched asynchronously, and to ensure we stay responsive.
> +  DecodeWorker::Singleton()->RequestDecode(this);
> +  return NS_OK;

Add a blank line before the return.

@@ +2906,5 @@
> +  }
> +
> +  request->mIsASAP = true;
> +
> +  if (request->isInList()) {

FWIW, I hope review comments for the LinkedList class cause LinkedList's capitalization to be Gecko-style. :)

@@ +2909,5 @@
> +
> +  if (request->isInList()) {
> +    // If the decode request is in a list, it must be in the normal decode
> +    // requests list -- if it had been in the ASAP list, then mIsASAP would
> +    // have been true above.  Move the request to the ASAP list.

You should add a comment saying that we should already be pending in this case, because RequestDecode() below depends on it.

You could add an assert/NS_ABORT_IF_FALSE if you wanted, for both pending and that the request is in the normal decode requests list.

@@ +2928,5 @@
> +
> +  if (request->mIsASAP) {
> +    mASAPDecodeRequests.insertBack(request);
> +  }
> +  else {

Cuddle else please.

@@ +2961,5 @@
> +  // During this trip through the event loop, we'll decode only ASAP requests
> +  // or only normal requests, but not both.  We need to spin the event loop
> +  // when we transition from ASAP to normal requests so the ASAP requests have
> +  // a chance to be drawn.
> +  LinkedList<DecodeRequest> &requestList =

LinkedList<DecodeRequest>& requestList

@@ +2963,5 @@
> +  // when we transition from ASAP to normal requests so the ASAP requests have
> +  // a chance to be drawn.
> +  LinkedList<DecodeRequest> &requestList =
> +    !mASAPDecodeRequests.isEmpty() ?
> +    mASAPDecodeRequests : mNormalDecodeRequests;

Put this all on one line for ease of reading. (I'd actually prefer if/else, but that's impossible with references.)

@@ +2969,5 @@
> +  while (true) {
> +    TimeStamp startIteration = TimeStamp::Now();
> +
> +    // Have we been decoding for too long?
> +    if ((startIteration - eventStart).ToMilliseconds() >= gMaxMSBeforeYield) {

I'd prefer if this while loop wasn't a while (true); perhaps a do... while loop would be better.

@@ +2986,5 @@
> +
> +  // If decode requests are pending, re-post ourself to the event loop.
> +  if (!mASAPDecodeRequests.isEmpty() || !mNormalDecodeRequests.isEmpty()) {
> +    mPendingInEventLoop = true;
> +    NS_DispatchToCurrentThread(this);

Put this (and the similar code in RequestDecode) inside a method:

void EnsurePending()
{
  if (!mPendingInEventLoop)
    // etc
}

@@ +2990,5 @@
> +    NS_DispatchToCurrentThread(this);
> +  }
> +
> +  Telemetry::Accumulate(Telemetry::IMAGE_DECODE_LATENCY,
> +                        PRUint32((TimeStamp::Now() - eventStart).ToMilliseconds()));

I think you need to test whether this is a size decode before accumulating, as they throw off the numbers.

@@ +3082,5 @@
> +
> +    if (NS_FAILED(aImg->ShutdownDecoder(RasterImage::eShutdownIntent_Done))) {
> +      aImg->DoError();
> +    }
> +    MOZ_ASSERT(!request->mIsASAP);

I don't understand why this assert is here.

@@ +3088,5 @@
> +
> +  // If we aren't yet finished decoding and we have more data in hand, re-post
> +  // ourselves to the decode worker.
> +  if (aImg->mDecoder && !aImg->IsDecodeFinished() && haveMoreData) {
> +    RequestDecode(aImg);

I think this is already handled in ::Run(), above, right?

We should only do it in one spot, probably Run(). And if it *is* Run(), we should rename this function, since it only decodes part of the image.

::: image/src/RasterImage.h
@@ +168,3 @@
>  class Decoder;
>  
> +namespace internal {

Why not make this a subclass of RasterImage?

@@ +182,5 @@
> +  DecodeRequest(RasterImage* aImage)
> +    : mImage(aImage)
> +    , mIsASAP(false)
> +  {
> +  }

{} on the same line is the usual convention, to show that there is intentionally no body.

::: modules/libpref/src/init/all.js
@@ +3283,5 @@
>  // Chunk size for calls to the image decoders
>  pref("image.mem.decode_bytes_at_a_time", 4096);
>  
>  // The longest time we can spend in an iteration of an async decode
> +pref("image.mem.max_ms_before_yield", 10);

Why should we change this to 10 ms as part of this patch?

Also: As currently written, this same time will be used as the timeout both for individual images and the multiple-image-decoder worker. Why should they have the same timeout?
Comment 36 Joe Drew (not getting mail) 2012-01-12 13:47:35 PST
Comment on attachment 588117 [details] [diff] [review]
Part 2, v1: Sync-decode the beginning of the image in RasterImage::SourceDataComplete()

Review of attachment 588117 [details] [diff] [review]:
-----------------------------------------------------------------

::: image/src/RasterImage.cpp
@@ +239,5 @@
>  
> +    /* Decode some chunks of the given image.  If aBriefDecode is set, decode
> +     * just a few bytes if no bytes have been decoded yet -- do just enough for
> +     * us to figure out if there's an error in the image. */
> +    nsresult DecodeImage(RasterImage* aImg, bool aBriefDecode = false);

Instead of a boolean, use an enumerated type.

@@ +3019,5 @@
>    // If an error is flagged, it probably happened while we were waiting
>    // in the event queue. Bail early, but no need to bother the run queue
>    // by returning an error.
>    if (aImg->mError)
> +    return NS_OK;

You went to a bit of work removing the nsresults from imgDecodeWorker::Run() when moving it to DecodeImage, and then put them back in this patch. Maybe it'd be better just to leave them in.

@@ +3031,5 @@
>  
> +  PRUint32 maxBytes;
> +  if (!aBriefDecode) {
> +    // We really do mean it when we say "brief".
> +    maxBytes = 32;

This should be a pref.

@@ +3052,5 @@
>    // 1) We don't have any data left to decode
>    // 2) The decode completes
>    // 3) We hit the deadline and need to yield to keep the UI snappy
>    while (haveMoreData && !aImg->IsDecodeFinished() &&
> +         TimeStamp::Now() < deadline) {

Unrelated change; you can put it in patch 1 if you want.

@@ +3067,5 @@
>      haveMoreData = aImg->mSourceData.Length() > aImg->mBytesDecoded;
> +
> +    // Iterate only once if we're a brief decode.
> +    if (aBriefDecode)
> +      break;

It would be cleaner if this was changed into a do...while loop, because this could be added to the condition.
Comment 37 Justin Lebar (not reading bugmail) 2012-01-12 14:12:33 PST
> 1) ASAP requests can starve normal requests

Is this a problem?  By definition, a normal request isn't onscreen.  So why do we care if it's starved to make room for ASAP requests?  (It'll become ASAP as soon as it is Draw()'n.)

> 2) AFAICT we don't round-robin through requests, so a long-decoding image can stop other images from 
> getting any work done at all, which is a functional change from before.

In DecodeWorker::Run, we popFirst(), but in DecodeWorker::RequestDecode() we insertBack().  So we do round-robin.

> Also: As currently written, this same time will be used as the timeout both for individual images 
> and the multiple-image-decoder worker. Why should they have the same timeout?

Why is it better to always multiplex one timeslice across multiple images?

Currently, we basically do

  A) 5ms decode | 5ms decode | ... | main event loop | 5ms decode | ...

But with this patch, we'll do

  B) 5ms decode | main event loop | 5ms decode | main event loop |

In either case, each 5ms decode is usually just for one image.  (In (A), if an image finishes before its 5ms is up, we move on to the next event immediately.  In (B), if an image finishes before its 5ms is up, we move on to another image for the remaining 5ms.)
Comment 38 Justin Lebar (not reading bugmail) 2012-01-12 14:22:11 PST
> +  PRUint32 maxBytes;
> +  if (!aBriefDecode) {
> +    // We really do mean it when we say "brief".
> +    maxBytes = 32;

> This should be a pref.

I'm surprised you didn't give me more grief over this, because I can't prove that 32 is the right number here, short of pointing to a green run on try.  :)  It's a correctness issue that this value be big enough, so I'm not sure it should be a pref.

> Why should we change this to 10 ms as part of this patch?

We used 5ms previously not because we wanted to yield to the event loop every 5ms, but because N images could stack up in the event loop, and we ended up yielding after 5*N-ms.  5ms was about the smallest value I could use and still give us reasonably-sized chunks to decode.

We can afford longer time-slices with this patch, since we don't have this 5*N-ms behavior.  I picked 10ms because it's under 60fps.

I can spin it off into a separate commit, if that's all you mean.  Or we can leave it at 5ms, which I'm also fine with.
Comment 39 Justin Lebar (not reading bugmail) 2012-01-12 14:31:07 PST
> Why not make [DecodeRequest] a subclass of RasterImage?

You say "composition", I say "inheritance."  I say "inheritance", you say "composition"...  I did it to match the style of DiscardTrackerNode.  But I can change it if you want.
Comment 40 Justin Lebar (not reading bugmail) 2012-01-12 15:15:40 PST
Joe explained on IRC that he meant that DecodeRequest could be an inner class of RasterImage.  That's fine with me!
Comment 41 Justin Lebar (not reading bugmail) 2012-01-12 16:30:38 PST
>@@ +2961,5 @@
>> +  // During this trip through the event loop, we'll decode only ASAP requests
>> +  // or only normal requests, but not both.  We need to spin the event loop
>> +  // when we transition from ASAP to normal requests so the ASAP requests have
>> +  // a chance to be drawn.
>> +  LinkedList<DecodeRequest> &requestList =
>
>LinkedList<DecodeRequest>& requestList

This file (and this patch) puts |*| on the variable names.  Why put |&| on the typename?

>@@ +2963,5 @@
>> +  // when we transition from ASAP to normal requests so the ASAP requests have
>> +  // a chance to be drawn.
>> +  LinkedList<DecodeRequest> &requestList =
>> +    !mASAPDecodeRequests.isEmpty() ?
>> +    mASAPDecodeRequests : mNormalDecodeRequests;
>
>Put this all on one line for ease of reading. (I'd actually prefer if/else, but that's impossible with references.)

That makes this a 121-character-long line.  I made requestList a pointer so we don't have to have this argument.  :)

>@@ +2969,5 @@
>> +  while (true) {
>> +    TimeStamp startIteration = TimeStamp::Now();
>> +
>> +    // Have we been decoding for too long?
>> +    if ((startIteration - eventStart).ToMilliseconds() >= gMaxMSBeforeYield) {
>
>I'd prefer if this while loop wasn't a while (true); perhaps a do... while loop would be better.

I did it this way so I don't have to read the time twice, at the top of the loop (to time the decode) and at the bottom of the loop, for the while() condition.

>> +    NS_DispatchToCurrentThread(this);
>> +  }
>> +
>> +  Telemetry::Accumulate(Telemetry::IMAGE_DECODE_LATENCY,
>> +                        PRUint32((TimeStamp::Now() - eventStart).ToMilliseconds()));
>
>I think you need to test whether this is a size decode before accumulating, as they throw off the numbers.

Well, except we might have done multiple things while we were here.  It could have been two size decodes, or a size decode plus a normal decode.

I think this is a pretty un-interesting piece of telemetry, tbh.  But the goal is to tell us how long we spend holding up the event loop, so maybe it should stay how it is?  You can now filter out small values in the telemetry dashboard, btw.

>@@ +3088,5 @@
>> +
>> +  // If we aren't yet finished decoding and we have more data in hand, re-post
>> +  // ourselves to the decode worker.
>> +  if (aImg->mDecoder && !aImg->IsDecodeFinished() && haveMoreData) {
>> +    RequestDecode(aImg);
>
>I think this is already handled in ::Run(), above, right?

No, but there is a bug here.

We need to RequestDecode() here so that *this particular image* is decoded again sometime; we pop the decode request off the list before calling DecodeImage().

But the code in Run() to post back to the event loop is also necessary, because it might happen that we call DecodeImage(), finish decoding the whole image (so don't call RequestDecode), and then notice that we're out of time.  In this case, we need to post ourselves back to the event loop!  That's what the code in Run() does.

However, the code in Run() should call EnsurePendingInEventLoop(), because we might already be pending; it currently assumes we're not pending, which is wrong.

>We should only do it in one spot, probably Run(). And if it *is* Run(), we should rename this function, since it only decodes part of the image.

Once you call RequestDecode() on an image, it'll get fully decoded.  It just might take a few async steps.  I guess it's kind of confusing that the method is used internally to mean "post me back and decode more of this image", but to the outside caller, this is just like recursion.
Comment 42 Bobby Holley (:bholley) (busy with Stylo) 2012-01-12 16:32:16 PST
(In reply to Justin Lebar [:jlebar] from comment #38)
> > This should be a pref.
> 
> I'm surprised you didn't give me more grief over this, because I can't prove
> that 32 is the right number here, short of pointing to a green run on try. 
> :)  It's a correctness issue that this value be big enough, so I'm not sure
> it should be a pref.

I'm a bit concerned about this. What exactly goes orange on try if this is too low?
Comment 43 Robert Claypool 2012-01-12 16:55:12 PST
What is this "yield"? Don't all the operating systems do preemptive multitasking? If not, what purpose does yielding serve on preemptive environments, and if so, why shouldn't this be designed to let the operating system handle it?
Is the answer just that the operating system isn't smart enough to?
Comment 44 Bobby Holley (:bholley) (busy with Stylo) 2012-01-12 17:14:27 PST
(In reply to Robert Claypool from comment #43)
> What is this "yield"? Don't all the operating systems do preemptive
> multitasking? If not, what purpose does yielding serve on preemptive
> environments, and if so, why shouldn't this be designed to let the operating
> system handle it?
> Is the answer just that the operating system isn't smart enough to?

Everything here is on one thread. When we say "yield" here we're just referring to "returning from the current function so that the event loop can schedule something else".
Comment 45 Joe Drew (not getting mail) 2012-01-12 17:29:07 PST
(In reply to Justin Lebar [:jlebar] from comment #41)
> >@@ +2961,5 @@
> >> +  LinkedList<DecodeRequest> &requestList =
> >
> >LinkedList<DecodeRequest>& requestList
> 
> This file (and this patch) puts |*| on the variable names.  Why put |&| on
> the typename?

New code should put * and & on typenames. We just haven't bothered changing old code.

> >@@ +2969,5 @@
> >> +  while (true) {
> >> +    TimeStamp startIteration = TimeStamp::Now();
> >> +
> >> +    // Have we been decoding for too long?
> >> +    if ((startIteration - eventStart).ToMilliseconds() >= gMaxMSBeforeYield) {
> >
> >I'd prefer if this while loop wasn't a while (true); perhaps a do... while loop would be better.
> 
> I did it this way so I don't have to read the time twice, at the top of the
> loop (to time the decode) and at the bottom of the loop, for the while()
> condition.

If you feel strongly, add a comment :)

> >> +  Telemetry::Accumulate(Telemetry::IMAGE_DECODE_LATENCY,
> >> +                        PRUint32((TimeStamp::Now() - eventStart).ToMilliseconds()));
> >
> >I think you need to test whether this is a size decode before accumulating, as they throw off the numbers.
> 
> Well, except we might have done multiple things while we were here.  It
> could have been two size decodes, or a size decode plus a normal decode.
> 
> I think this is a pretty un-interesting piece of telemetry, tbh.  But the
> goal is to tell us how long we spend holding up the event loop, so maybe it
> should stay how it is?  You can now filter out small values in the telemetry
> dashboard, btw.

Jeff, you added the test for size decodes. What do you think?

> >We should only do it in one spot, probably Run(). And if it *is* Run(), we should rename this function, since it only decodes part of the image.
> 
> Once you call RequestDecode() on an image, it'll get fully decoded.  It just
> might take a few async steps.  I guess it's kind of confusing that the
> method is used internally to mean "post me back and decode more of this
> image", but to the outside caller, this is just like recursion.

It's a little incestuous, though:

> In DecodeWorker::Run, we popFirst(), but in DecodeWorker::RequestDecode() we
> insertBack().  So we do round-robin.

I'd *much* rather it be only in Run(), rather than in two different places.
Comment 46 Justin Lebar (not reading bugmail) 2012-01-12 18:31:22 PST
> If you feel strongly, add a comment :)

Oh, I was totally out to lunch.  A do/while loop is much better here.  Sorry!

(We call TimeStamp::Now() way too much, still.)
Comment 47 Justin Lebar (not reading bugmail) 2012-01-12 18:43:58 PST
I'll change all my pointers to type* name, but only on the condition I get to stand on the soapbox for a moment:

The point of having a style guide is so that you all your code looks the same.  Consistency is much more important than any particular style point.  So it is in fact counter-productive to the basic purpose of a style guide to say "new code in existing files follows rule X; we just haven't changed the old code yet."

If the old code's style is so egregious, then it can be fixed.  But I seriously doubt that anyone is going to go through RasterImage and change all its pointers.  We all have better things to do.  So by enforcing this rule, we're actually making our style *worse*.

Thank you.  I will now proceed to do as I was told.
Comment 48 Justin Lebar (not reading bugmail) 2012-01-12 18:51:26 PST
> I'd *much* rather it be only in Run(), rather than in two different places.

This breaks the invariant you asked me to add in MarkAsASAP.  Now request->isInList() does not imply mPendingInEventLoop, because we specifically add requests to the list without ensuring that the worker is pending.
Comment 49 Justin Lebar (not reading bugmail) 2012-01-12 18:57:43 PST
> So it is in fact counter-productive to the basic purpose of a style guide to say "new code in 
> existing files follows rule X; we just haven't changed the old code yet."

Note that this is basically the same argument you made in bug 715405 comment 22.  :)
Comment 50 Joe Drew (not getting mail) 2012-01-12 19:06:30 PST
(In reply to Justin Lebar [:jlebar] from comment #48)
> > I'd *much* rather it be only in Run(), rather than in two different places.
> 
> This breaks the invariant you asked me to add in MarkAsASAP.  Now
> request->isInList() does not imply mPendingInEventLoop, because we
> specifically add requests to the list without ensuring that the worker is
> pending.

Maybe that shouldn't be possible, then:

>+    DecodeRequest *request = requestList.popFirst();
>+
>+    if (!request) {
>+      // Nothing more to decode!
>+      break;
>+    }
>+
>+    DecodeImage(request->mImage);

if (!request->mImage->IsDecoded())
  ContinueDecoding(request);

(with an obvious implementation of IsDecoded, and ContinueDecoding() calling EnsurePendingInEventLoop(), and DecodeImage renamed to DecodeSomeImageData() or whatever.)

Then you can remove the

>+  // If decode requests are pending, re-post ourself to the event loop.
>+  if (!mASAPDecodeRequests.isEmpty() || !mNormalDecodeRequests.isEmpty()) {
>+    mPendingInEventLoop = true;
>+    NS_DispatchToCurrentThread(this);
>+  }

hunk altogether.
Comment 51 Joe Drew (not getting mail) 2012-01-12 19:07:11 PST
(In reply to Justin Lebar [:jlebar] from comment #49)
> > So it is in fact counter-productive to the basic purpose of a style guide to say "new code in 
> > existing files follows rule X; we just haven't changed the old code yet."
> 
> Note that this is basically the same argument you made in bug 715405 comment
> 22.  :)

Yeah, but I'm the module owner of Imagelib. :)

(FWIW, I would accept a patch to switch us wholesale.)
Comment 52 Robert Lickenbrock [:rclick] 2012-01-16 00:27:14 PST
*** Bug 674547 has been marked as a duplicate of this bug. ***
Comment 53 Justin Lebar (not reading bugmail) 2012-01-19 14:01:16 PST
> Then you can remove the [if decode requests are pending, re-post ourself to the event loop] hunk 
> altogether.

Sorry, I don't see how this solves the problem.  To wit:

Suppose we have two pending decodes.  We call DecodeImage and decode the entirety of one image.  Then we check the time and see that we've exhausted our timeslice.

At this point, we must re-post ourselves to the event loop, because we still have a decode pending.  But we're not going to re-post ourselves on behalf of the image we just finished decoding; that is, the ContinueDecoding(request) call in comment 50 will not be hit.  So we still need the hunk at the end of comment 50.

I don't think we can remove the [repost if decodes are still pending] hunk.  We can make ContinueDecoding() not cause us to repost ourselves, and I'll post a patch with this.  I'll let you be the judge of whether it's clearer this way.
Comment 54 Justin Lebar (not reading bugmail) 2012-01-19 14:44:35 PST
Created attachment 590003 [details] [diff] [review]
Patch v4

Outstanding questions:

 * Justin: Exactly what fails when the immediate-decode size is too small? (bholley, comment 42)
 * Jeff: Is the change to IMAGE_DECODE_LATENCY ok? (joe, comment 45)
Comment 55 Justin Lebar (not reading bugmail) 2012-01-19 14:55:51 PST
One more oustanding question:

 * Joe: Is the new DecodeWorker API, where during DecodeWorker::Run, we add images to the list of pending decodes but don't post the worker to the event loop until it's done with its timeslice, better?  Note the comment in MarkAsASAP; it's now incorrect to call that during DecodeWorker::Run, because of the issue with the invariant.
Comment 56 Joe Drew (not getting mail) 2012-01-19 15:02:22 PST
> I don't think we can remove the [repost if decodes are still pending] hunk.  We
> can make ContinueDecoding() not cause us to repost ourselves, and I'll post a
> patch with this.  I'll let you be the judge of whether it's clearer this way.

I'll judge once I see the patch, but you do make a good point and I'm reasonably convinced. I think this answers comment 55; let me know if this is not the case.
Comment 57 Justin Lebar (not reading bugmail) 2012-01-19 16:15:59 PST
>  * Justin: Exactly what fails when the immediate-decode size is too small? (bholley, comment 42)

We'll find out in https://tbpl.mozilla.org/?tree=Try&rev=07afd9abe685
Comment 58 Justin Lebar (not reading bugmail) 2012-01-19 17:20:48 PST
(In reply to Justin Lebar [:jlebar] from comment #57)
> >  * Justin: Exactly what fails when the immediate-decode size is too small? (bholley, comment 42)
> 
> We'll find out in https://tbpl.mozilla.org/?tree=Try&rev=07afd9abe685

Looks green for decoding 4 bytes.  Are you curious to go smaller than that, Bobby?
Comment 59 Bobby Holley (:bholley) (busy with Stylo) 2012-01-19 17:24:26 PST
(In reply to Justin Lebar [:jlebar] from comment #58)

> Looks green for decoding 4 bytes.  Are you curious to go smaller than that,
> Bobby?

I'm not concerned with the actually number, just the question of what's limiting us here. You mentioned earlier (unless I'm mistaken) that there were failures when the number was too small, so I'm wondering (a) how small is too small, (b) what those failures are, and (c) why they fail. What happens if we write zero bytes?
Comment 60 Justin Lebar (not reading bugmail) 2012-01-19 17:30:47 PST
> You mentioned earlier (unless I'm mistaken) that there were failures when the number was too small

Yes, but in my mind, I meant that, if we don't write anything (which may not be the same as writing 0 bytes!), we know it's not going to work.  Pushed to try with 0 bytes:

https://tbpl.mozilla.org/?tree=Try&rev=53cfc1a752fe
Comment 61 Bobby Holley (:bholley) (busy with Stylo) 2012-01-19 17:35:21 PST
(In reply to Justin Lebar [:jlebar] from comment #60)
> Yes, but in my mind, I meant that, if we don't write anything (which may not
> be the same as writing 0 bytes!)

Wait, what? How so?
Comment 62 Justin Lebar (not reading bugmail) 2012-01-19 17:42:17 PST
Maybe I'm being too coy -- knowing nothing about the code, it's certainly *possible* that writing 0 bytes would be different than not invoking the decoder.  Maybe it is the same!
Comment 63 Joe Drew (not getting mail) 2012-01-19 19:41:32 PST
Comment on attachment 590003 [details] [diff] [review]
Patch v4

Review of attachment 590003 [details] [diff] [review]:
-----------------------------------------------------------------

::: image/src/RasterImage.cpp
@@ +3143,5 @@
> +  // If we aren't yet finished decoding and we have more data in hand, re-post
> +  // ourselves to the decode worker.
> +  if (aImg->mDecoder && !aImg->IsDecodeFinished() && haveMoreData) {
> +    AddDecodeRequest(&aImg->mDecodeRequest);
> +  }

Some quick feedback - Why can't this be in the Run() loop above, thereby centralizing all modification of the queues?
Comment 64 Robert Lickenbrock [:rclick] 2012-01-19 21:07:31 PST
(In reply to Joe Drew (:JOEDREW!) from comment #63) 
> Some quick feedback - Why can't this be in the Run() loop above, thereby
> centralizing all modification of the queues?

FWIW, the refactoring I want to do in bug 718927 would be cleaner if this stayed as part of the decoding code. I'm planning on moving the decoding code here to a method on the RasterImage, after which it won't necessarily be called only by the DecoderWorker.

For example, in bug 685516 we want to sync decode for a timeslice in RasterImage::RequestDecode() then decode the rest asynchronously. After the synchronous decoding yields, we need to check if we decoded all the data and invoke more decoding if we haven't. As part of the decoding code, this would only need to happen in once place rather than being duplicated at each callsite.
Comment 65 Robert Lickenbrock [:rclick] 2012-01-19 21:09:52 PST
(In reply to Robert Lickenbrock [:rclick] from comment #64)
> this would only need to happen in once place rather than being
> duplicated at each callsite.
s/once/one
Comment 66 Justin Lebar (not reading bugmail) 2012-01-19 21:39:37 PST
> Some quick feedback - Why can't this be in the Run() loop above, thereby centralizing all 
> modification of the queues?

I think that's a good idea, at least from the perspective of this patch.  I'm tempted to do it; Robert, do you mind changing it when you do your refactoring?  I hope it won't mess you up too much.
Comment 67 Justin Lebar (not reading bugmail) 2012-01-19 21:41:38 PST
> Maybe I'm being too coy -- knowing nothing about the code, it's certainly *possible* that writing 0 
> bytes would be different than not invoking the decoder.  Maybe it is the same!

Indeed, sync-writing 0 bytes is much, much worse than just not writing anything at all.  0 bytes turns almost every test suite orange, whereas not writing at all just failed a few suites.

https://tbpl.mozilla.org/?tree=Try&rev=53cfc1a752fe
Comment 68 Robert Lickenbrock [:rclick] 2012-01-19 22:50:09 PST
(In reply to Justin Lebar [:jlebar] from comment #66)
> I think that's a good idea, at least from the perspective of this patch. 
> I'm tempted to do it; Robert, do you mind changing it when you do your
> refactoring?  I hope it won't mess you up too much.

Changing it if I need to won't be a problem. Go ahead and do it.

(In reply to Justin Lebar [:jlebar] from comment #67)
> Indeed, sync-writing 0 bytes is much, much worse than just not writing
> anything at all.  0 bytes turns almost every test suite orange, whereas not
> writing at all just failed a few suites.

The orange test suites all crash somewhere under DecodeImage(), so I suspect that writing 0 bytes to the decoder is broken in general.

What kind of failures are there if nothing is written at all?
Comment 69 Justin Lebar (not reading bugmail) 2012-01-20 07:21:12 PST
> What kind of failures are there if nothing is written at all?

From comment 14: https://tbpl.mozilla.org/?tree=Try&rev=eb29adb5609e
Comment 70 Bobby Holley (:bholley) (busy with Stylo) 2012-01-20 07:58:24 PST
(In reply to Justin Lebar [:jlebar] from comment #69)
> > What kind of failures are there if nothing is written at all?
> 
> From comment 14: https://tbpl.mozilla.org/?tree=Try&rev=eb29adb5609e

So it causes us to report an incorrect size? This is probably indicative of a bug, since the decoders are supposed to wait until they have enough data before parsing out a size. Do we need a check? An assert? I just want to make sure that we don't sweep this issue under the rug. It might be the kind of thing that could cause intermittent orange.
Comment 71 Justin Lebar (not reading bugmail) 2012-01-20 08:57:34 PST
> So it causes us to report an incorrect size?

What makes you say that?

I'd looked at [1], which is a reftest failure caused by us not noticing that an image is malformed, without the brief sync-decode.

[1] https://tbpl.mozilla.org/php/getParsedLog.php?id=8428944&tree=Try#error0
Comment 72 Bobby Holley (:bholley) (busy with Stylo) 2012-01-20 09:04:58 PST
(In reply to Justin Lebar [:jlebar] from comment #71)
> > So it causes us to report an incorrect size?
> 
> What makes you say that?

https://tbpl.mozilla.org/php/getParsedLog.php?id=8428494&tree=Try
Comment 73 Justin Lebar (not reading bugmail) 2012-01-20 13:10:28 PST
> This is probably indicative of a bug, since the decoders are supposed to wait until they have enough 
> data before parsing out a size.

It could be that we don't have a size but onload gets called anyway, and then content notices that we don't have the size.

Is there a better way to do a sync size decode than sync decoding 8 bytes of the image?
Comment 74 Bobby Holley (:bholley) (busy with Stylo) 2012-01-20 13:27:46 PST
(In reply to Justin Lebar [:jlebar] from comment #73)
> > This is probably indicative of a bug, since the decoders are supposed to wait until they have enough 
> > data before parsing out a size.
> 
> It could be that we don't have a size but onload gets called anyway, and
> then content notices that we don't have the size.

In the current world, onload doesn't fire until all images fire the OnStopDecode+OnStopRequest siamese twins. This doesn't happen until a full size decode has happened, which means that every byte of data needs to have passed through the decoder, even if the decoder ignored most of it (because it was in size decode mode).

So we've got some kind of bug here.
Comment 75 Joe Drew (not getting mail) 2012-01-20 15:12:03 PST
*** Bug 671641 has been marked as a duplicate of this bug. ***
Comment 76 Justin Lebar (not reading bugmail) 2012-01-22 07:23:03 PST
> In the current world, onload doesn't fire until all images fire the OnStopDecode+OnStopRequest 
> siamese twins. This doesn't happen until a full size decode has happened, which means that every 
> byte of data needs to have passed through the decoder, even if the decoder ignored most of it 
> (because it was in size decode mode).

Are you sure this is correct?

AIU the current code, onload is unblocked by SourceDataComplete.  And SourceDataComplete invokes the decoder for as long as it wants to run; in particular, the decoder might not run to completion.  If we take out this invocation, then tests fail as in comment 72 -- this indicates to me that the decoder is not looking at every byte of data before onload.
Comment 77 Bobby Holley (:bholley) (busy with Stylo) 2012-01-22 14:02:02 PST
(In reply to Justin Lebar [:jlebar] from comment #76)
> AIU the current code, onload is unblocked by SourceDataComplete.

Not really. It's called from Decoder::PostDecodeDone() (superclass method), which is called at different times for different decoder implementations. In particular, some decoders call it when they detect the end of useful data (like the png and jpg decoders), while others call it in FinishInternal, which gets called when the decoder gets shut down. This only happens in SourceDataComplete() if we're not storing source data (which we almost always are). I'm pretty sure that the common case for decoder shutdown (in the BMP case) is in imgDecodeWorker::Run(), once all the bytes have been written through.
Comment 78 Robert Lickenbrock [:rclick] 2012-01-23 03:45:10 PST
(In reply to Bobby Holley (:bholley) from comment #74)
> In the current world, onload doesn't fire until all images fire the
> OnStopDecode+OnStopRequest siamese twins. This doesn't happen until a full
> size decode has happened, which means that every byte of data needs to have
> passed through the decoder, even if the decoder ignored most of it (because
> it was in size decode mode).

The patch here makes the decoding in AddSourceData() and SourceDataComplete() asynchronous, so there's no guarantee the size decode will finish before OnStopDecode+OnStopRequest (and thus onload) fire.


(In reply to Justin Lebar [:jlebar] from comment #73)
> Is there a better way to do a sync size decode than sync decoding 8 bytes of
> the image?

One way of doing it is to check what type of decoder you have in SourceDataComplete().

If it's a size decoder, synchronously write all the data in one big chunk. The decoder will bail as soon as it's parsed and posted the dimensions, so this is cheap and shouldn't be a performance concern.

AFAICT you should be able to do things asynchronously if it's a full decoder. In that case, Decoder::Init() fires OnStartDecode which triggers onload blocking until the decode finishes.
Comment 79 Justin Lebar (not reading bugmail) 2012-01-23 07:54:53 PST
> One way of doing it is to check what type of decoder you have in SourceDataComplete().

I'd tried this, but it does not work.  In [1], test_fileapi_slice passes, but test_bug369370 fails in the same way.

This suggests that the issue is not only with size decodes.

[1] https://tbpl.mozilla.org/php/getParsedLog.php?id=8428494&tree=Try
Comment 80 Justin Lebar (not reading bugmail) 2012-01-23 08:25:37 PST
I talked with bholley about this on IRC, and it turns out that we should ignore most of what he's posted here:

<bholley> jlebar: I was looking at the other siamese twins :-(
<bholley> jlebar: I was looking at the internal OnStopDecoder/OnStopContainer, rather than the external OnStopDecode/OnStopRequest
<bholley> jlebar: bottom line - my comment was wrong, OnStopDecode is fired on imgRequest::OnStopRequest

So this business about onload not being fired until the full decode has happened isn't correct.

There's still the question of: How many bytes should we decode in SourceDataComplete, and why is it necessary to do this at all?
Comment 81 Bobby Holley (:bholley) (busy with Stylo) 2012-01-23 08:29:42 PST
(In reply to Justin Lebar [:jlebar] from comment #80)
> I talked with bholley about this on IRC, and it turns out that we should
> ignore most of what he's posted here

Only starting from comment 74. Everything else should be valid.
Comment 82 Justin Lebar (not reading bugmail) 2012-01-23 08:34:24 PST
It seems that we should decode at least as many bytes as we need for the longest mime-type sniff.  32 ought to be more than enough for now and the conceivable future, right?
Comment 83 Justin Lebar (not reading bugmail) 2012-01-23 08:45:21 PST
Created attachment 590729 [details] [diff] [review]
Patch v5

I reset the brief decode length to 32, since that should be more than long enough for any MIME type sniff, and made it a constant.
Comment 84 Justin Lebar (not reading bugmail) 2012-01-23 14:39:03 PST
Just tested on my Mac, and this patch apparently delays tab switching until all images on the tab are decoded.  :-/
Comment 85 Justin Lebar (not reading bugmail) 2012-01-23 15:06:49 PST
Whether the delay is long or not seems to be timing-dependent, at least on my Linux box.  It's usually pretty fast to switch tabs, but sometimes it's slow like on my mac.  (On the mac, it appears to be consistently very slow.)
Comment 86 Justin Lebar (not reading bugmail) 2012-01-23 17:13:06 PST
> +  } while ((TimeStamp::Now() - eventStart).ToMilliseconds() >= gMaxMSBeforeYield);

Oh, well that would do it.  <= works much better.
Comment 87 Justin Lebar (not reading bugmail) 2012-01-23 17:57:49 PST
Created attachment 590957 [details] [diff] [review]
Part 1, v6

Review: :joedrew!

Tested on Linux and Mac; this works far better than any previous version.  :)
Comment 88 Justin Lebar (not reading bugmail) 2012-01-24 12:07:02 PST
In case it got lost in the noise here, Jeff, the f? is on changing Telemetry::IMAGE_DECODE_LATENCY, per comment 45.
Comment 89 Joe Drew (not getting mail) 2012-01-24 12:49:13 PST
Comment on attachment 590957 [details] [diff] [review]
Part 1, v6

Review of attachment 590957 [details] [diff] [review]:
-----------------------------------------------------------------

::: image/src/RasterImage.cpp
@@ +176,5 @@
>  }
>  
>  namespace mozilla {
>  namespace image {
> +namespace internal {

Can you move this to be a child class of RasterImage?

class RasterImage {
class DecodeWorker {
//

@@ +181,5 @@
> +
> +/* DecodeWorker is a singleton class which we use when decoding large images.
> + *
> + * When we decode an image larger than gMaxBytesForSyncDecode, we call
> + * DecodeWorker::RequestDecode() for the image.  The DecodeWorker instance

Here you should reword to make it a little more clear that the image gets added to a queue.

@@ +199,5 @@
> +
> +    /**
> +     * Ask the DecodeWorker to asynchronously decode some of this image.
> +     */
> +    void RequestDecode(RasterImage* aImg);

It's really an implementation detail of this function that it only decodes some of the image, right?

If we really want to make a distinction, we could have a private function to enqueue an image, and have this just call that.

@@ +258,5 @@
> +    /* Decode some chunks of the given image.  If aDecodeType is brief, decode
> +     * just a few bytes if no bytes have been decoded yet -- do just enough for
> +     * us to figure out if there's an error in the image. */
> +    nsresult DecodeImage(RasterImage* aImg,
> +                         DecodeType aDecodeType = DECODE_TYPE_NORMAL);

DecodeSomeOfImage? Decode5msOfImage()? You should change this name, as it doesn't decode the whole image, it just decodes until it runs out of time.

@@ +261,5 @@
> +    nsresult DecodeImage(RasterImage* aImg,
> +                         DecodeType aDecodeType = DECODE_TYPE_NORMAL);
> +
> +    LinkedList<DecodeRequest> mASAPDecodeRequests;
> +    LinkedList<DecodeRequest> mNormalDecodeRequests;

Right above here, can you add |private: // members| (and similarly put |private: // statics| and |private: // methods|)? I like the separation.

@@ +1658,5 @@
>    }
>  
> +  // If there's a decoder open, synchronously decode the beginning of the image
> +  // to check for errors.  (If the beginning has already been decoded, this
> +  // does nothing.)  Then kick off an async decode of the rest of the image.

Given this explicit behaviour (I recognize that this is not a change), I would like to see a test that documents and ensures what we do with onload and onerror in the case of images that are invalid in the first 32 bytes and invalid past that.

@@ +3016,5 @@
> +    // this request to the back of the list.
> +    if (image->mDecoder &&
> +        !image->mError &&
> +        !image->IsDecodeFinished() &&
> +        image->mSourceData.Length() > image->mBytesDecoded) {

It would be nice to encapsulate this in a RasterImage::IsFinishedDecoding() or similar.

Actually, why doesn't RasterImage::IsDecodeFinished() suffice?

@@ +3071,5 @@
> +    // Size decodes are cheap and we more or less want them to be
> +    // synchronous. Write all the data in that case, otherwise write a
> +    // chunk.
> +    maxBytes = aImg->mDecoder->IsSizeDecode()
> +      ? aImg->mSourceData.Length() : gDecodeBytesAtATime;

Can you make this an if? The ternary operator makes my tummy hurt.

Alternately, put this on the same line.
Comment 90 Justin Lebar (not reading bugmail) 2012-01-24 13:40:29 PST
> +    // this request to the back of the list.
> +    if (image->mDecoder &&
> +        !image->mError &&
> +        !image->IsDecodeFinished() &&
> +        image->mSourceData.Length() > image->mBytesDecoded) {
>
> It would be nice to encapsulate this in a RasterImage::IsFinishedDecoding() or similar.
>
> Actually, why doesn't RasterImage::IsDecodeFinished() suffice?

Well, you need the mDecoder check, because IsDecodeFinished() doesn't work without that.  And it's not clear that you don't need the !mError clause.

However, I can certainly get rid of mSourceData clause.  Would that be sufficient?
Comment 91 Justin Lebar (not reading bugmail) 2012-01-24 13:45:22 PST
> Given this explicit behaviour (I recognize that this is not a change), I would like to see a test 
> that documents and ensures what we do with onload and onerror in the case of images that are invalid 
> in the first 32 bytes and invalid past that.

Which one of these "invalid"s should be "valid"?
Comment 92 Joe Drew (not getting mail) 2012-01-24 14:07:14 PST
Neither! I want tests for onload and/or onerror in the case of:
 - Valid images
 - Invalid image in the first 32 bytes
 - Invalid image past the first 32 bytes
Comment 93 Justin Lebar (not reading bugmail) 2012-01-25 08:52:46 PST
>  - Valid images

I presume this is already covered e.g. by any simple reftest which uses a valid image?

>  - Invalid image in the first 32 bytes

I think this is covered by layout/reftests/object/malformed-should-fallback.html.  It does

  <object data="extra/malformed_image.png">PASS</object>

where the contents of malformed_image.png are "FAIL".

>  - Invalid image past the first 32 bytes

I'll add a test for this one.
Comment 94 Justin Lebar (not reading bugmail) 2012-01-25 08:54:49 PST
> where the contents of malformed_image.png are "FAIL".

That is, the bytes of malformed_image.png are the ASCII characters "FAIL".
Comment 95 Joe Drew (not getting mail) 2012-01-25 09:41:21 PST
No, sorry, I specifically want tests to make sure that onload *does* fire and onerror *does not* fire on a valid image, that <img src="extra/malformed_image.png"> does fire onerror (I don't remember whether onload fires on it too), and to test and document what happens for <img src="extra/big_malformed_image.png"> wrt onload/onerror.
Comment 96 Justin Lebar (not reading bugmail) 2012-01-25 09:48:57 PST
Okay.

WRT <object data="big_malformed_image.png">FOO</object>, it looks like this patch introduces a change in behavior.  AFAICT, we'd previously fall back to "FOO" if the error occurred within the first decoded-chunk of the image (as much as we can decode in 5ms), but now we'll fall back to "FOO" only if the error  occurs within the first 32 bytes of the image.

I don't think there's a change in onload/onerror behavior, however.
Comment 97 Justin Lebar (not reading bugmail) 2012-01-25 10:33:15 PST
By the way, exactly what constitutes an "error" is somewhat wishy-washy, and probably depends on the specifics of the decoder.  I was able to make a PNG which rendered only halfway, but didn't trigger an error.

So a site which relies on onerror being fired for any erroneous image is already screwed.
Comment 98 Justin Lebar (not reading bugmail) 2012-01-25 10:55:50 PST
Created attachment 591543 [details] [diff] [review]
Part 0, v1: onload/onerror tests

Joe, is this what you had in mind?  These tests pass with and without the patch here.
Comment 99 Joe Drew (not getting mail) 2012-01-25 11:27:24 PST
Crap, forgot to send this message:

(In reply to Justin Lebar [:jlebar] from comment #96)
> Okay.
> 
> WRT <object data="big_malformed_image.png">FOO</object>, it looks like this
> patch introduces a change in behavior.  AFAICT, we'd previously fall back to
> "FOO" if the error occurred within the first decoded-chunk of the image (as
> much as we can decode in 5ms), but now we'll fall back to "FOO" only if the
> error  occurs within the first 32 bytes of the image.
> 
> I don't think there's a change in onload/onerror behavior, however.

OK. I'm fine with changing behaviour, but we should make sure we don't accidentally change it further. Be sure to add/modify tests to ensure we know our current state wrt both onload/onerror and fallbacks.
Comment 100 Joe Drew (not getting mail) 2012-01-25 11:34:49 PST
Comment on attachment 591543 [details] [diff] [review]
Part 0, v1: onload/onerror tests

Review of attachment 591543 [details] [diff] [review]:
-----------------------------------------------------------------

This is great for testing onload/onerror.
Comment 101 Justin Lebar (not reading bugmail) 2012-01-25 11:43:41 PST
<jrmuizel> jlebar: can you summarize the telemetry change?
<jlebar> jrmuizel, So there's IMAGE_DECODE_LAG, or whatever it's called.  It measures how long we sit in the event loop before yielding.
<jlebar> At the moment, if we decode a bunch of small images, we'll get a bunch of small values added to that histogram.
<jlebar> Also, at the moment, size decodes are ignored for the purposes of the histogram.
<jlebar> With the patch, the histogram still tells you how long we sit in the event loop before yielding.
<jlebar> But we may decode multiple images before yielding.
<jlebar> So if you decode a bunch of small images, you may get just one large value put into the histogram.
<jlebar> Similarly, we may do some size decodes and some normal decodes in the same trip through the event loop.
<jlebar> So I no longer ignore size decodes.
<jlebar> IMO, the point of the histogram is to say "Here's how long we blocked the event loop for."  So these changes seem fine to me.
<jlebar> In fact, it's a much better histogram now, because before, we might queue up 10 workers  each blocking the event loop for 5ms at a time.  THe histogram would say "5ms", but in fact, the event loop would be blocked for 50ms.
<jlebar> Anyway, that's not much of a summary.
<jrmuizel> jlebar: feedback+
Comment 102 Justin Lebar (not reading bugmail) 2012-01-25 12:04:47 PST
Created attachment 591579 [details] [diff] [review]
Part 2: Add <object> fallback test.

Are we good to go after this patch?  I guess I should push to try once more.
Comment 103 Justin Lebar (not reading bugmail) 2012-01-25 15:52:15 PST
> +  if (!aDecodeType == DECODE_TYPE_BRIEF) {

This stray "!" appeared when I first introduced brief decodes, meaning that we were never actually decoding 32 bytes, as I thought; we were always doing 4096.

Tests fail when we sync-decode 40 or fewer bytes.  41 seems to work, but I haven't run all the tests.  I'm going to try to pinpoint exactly why.
Comment 104 Mozilla RelEng Bot 2012-01-25 17:00:26 PST
Try run for 357c8e8ee05b is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=357c8e8ee05b
Results (out of 61 total builds):
    exception: 13
    success: 34
    warnings: 14
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-357c8e8ee05b
Comment 105 Robert Lickenbrock [:rclick] 2012-01-25 17:56:51 PST
(In reply to Justin Lebar [:jlebar] from comment #90)
> However, I can certainly get rid of mSourceData clause.
I'm pretty sure that it's needed for the case where the image isn't finished downloading and all available data has been decoded. RasterImage::IsDecodeFinished() returns false in that case.
Comment 106 Justin Lebar (not reading bugmail) 2012-01-25 18:18:32 PST
Ah, you're right!  I had to write out all the boolean logic, but it comes out to:

!IsDecodeFinished() iff

  !(IsSizeDecode && mHasSize) && !(!IsSizeDecode() && mDecoded) &&
  (!mHasSourceData || mBytesDecoded != mSourceData.Length())

!mHasSourceData means "we haven't received all the image data."  But we only want to enqueue another decode if we have more source data to decode in hand.  That is, we want !IsDecodeFinished() && mBytesDecoded != mSourceData.Length().

> +    if (image->mDecoder &&
> +        !image->mError &&
> +        !image->IsDecodeFinished() &&
> +        image->mSourceData.Length() > image->mBytesDecoded) {

Joe, I can write this as a separate method, like you asked, but I think this bug demonstrates that it would be better to leave it as above, or even to inline the IsDecodeFinished() call.  Too much DeMorgan's law for one codeblock.
Comment 107 Joe Drew (not getting mail) 2012-01-25 18:23:11 PST
Why do we have to test for mDecoder and !mError, though?
Comment 108 Justin Lebar (not reading bugmail) 2012-01-25 18:44:56 PST
Per comment 90, you need mDecoder because it's incorrect to call IsDecodeFinished() with !mDecoder.  We might not have had a decoder to begin with, or it might have been shut down when we finished decoding.

Similarly, we might have had mError == true before we started decoding, or we might have hit that during the decode.
Comment 109 Justin Lebar (not reading bugmail) 2012-01-25 18:50:27 PST
WRT the 40/41 bytes issue from comment 103: With 40 bytes, the sync decode doesn't end up calling nsPNGDecoder::info_callback, whereas with 41 bytes, we get the info callback.

The info callback does a number of things, but in particular, it reads the image's size.  (The decode in question is a normal decode, not a size decode.)
Comment 110 Justin Lebar (not reading bugmail) 2012-01-25 18:54:09 PST
(In reply to Joe Drew (:JOEDREW!) from comment #107)
> Why do we have to test for mDecoder and !mError, though?

Answered another way: Because we have if(mDecoder && !mError) at the beginning of DecodeWorker::DecodeSomeData() -- if DecodeSomeData bails on !mDecoder || mError, we certainly shouldn't re-enqueue the image.
Comment 111 Justin Lebar (not reading bugmail) 2012-01-25 19:02:59 PST
I think I understand what's going on: If we don't brief-decode enough bytes, then we don't read the image size before firing onload, and tests break.  This only happened to work in our current code; the decoder can decode synchronously as much or as little as it wants, but it always decodes at least 4092 bytes, which is plenty.

I think the solution is to make brief decodes decode until they get the size or they run out of data.
Comment 112 Joe Drew (not getting mail) 2012-01-25 19:05:55 PST
That's what size decodes are for, though.
Comment 113 Justin Lebar (not reading bugmail) 2012-01-25 19:12:48 PST
But it doesn't look like the code is at all set up to do a sync size decode while we're in the middle of a normal decode.
Comment 114 Joe Drew (not getting mail) 2012-01-25 19:42:07 PST
Right, ok. We do size decodes ahead of time for almost all images (except chrome:, res: and multipart/x-mixed-replace), but those size decodes would almost immediately be interrupted by regular decodes for any image that's being drawn.

So, I think what you should do is simplify a bit by changing SyncDecodeBeginning to:
1) Return immediately if gDecodeBytesAtATime bytes have already been decoded;
2) Write as many bytes to the decoder as required to have decoded gDecodeBytesAtATime.

Alternate #2: just flush gDecodeBytesAtATime bytes to the decoder.
Comment 115 Justin Lebar (not reading bugmail) 2012-01-25 19:57:18 PST
What's the advantage of doing that as opposed to decoding, say, 64 bytes at a time until we have a size or hit gDecodeBytesAtATime?  We suspect that 64 should be sufficient for most cases.

I ask because I recall that decoding even 4096 bytes of a PNG can take a fair bit of time.  Actually...we have telemetry, so we don't have to guess.  About 3% of images decode at less than 500KB (compressed) per second, so decoding 4KB could take 1ms.  Multiply that by a few images and we're talking real money...
Comment 116 Joe Drew (not getting mail) 2012-01-25 20:01:52 PST
I'm just looking to reduce the amount of code (and hence complexity), if possible. Maybe we can add some telemetry to find out how long a short decode actually takes? I don't feel super-strongly about it, but I'd just rather make things a little prettier.
Comment 117 Mozilla RelEng Bot 2012-01-26 00:30:18 PST
Try run for 5c8429d15c91 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=5c8429d15c91
Results (out of 209 total builds):
    success: 168
    warnings: 40
    other: 1
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-5c8429d15c91
Comment 118 Justin Lebar (not reading bugmail) 2012-01-26 09:10:15 PST
Created attachment 591822 [details] [diff] [review]
Interdiff part 1 v6 --> v7
Comment 119 Justin Lebar (not reading bugmail) 2012-01-26 09:10:43 PST
Comment on attachment 591579 [details] [diff] [review]
Part 2: Add <object> fallback test.

This test no longer applies with part 1 v7.
Comment 120 Justin Lebar (not reading bugmail) 2012-01-26 09:11:07 PST
Created attachment 591823 [details] [diff] [review]
Part 1 v7
Comment 121 Joe Drew (not getting mail) 2012-01-26 12:04:18 PST
Comment on attachment 591823 [details] [diff] [review]
Part 1 v7

Review of attachment 591823 [details] [diff] [review]:
-----------------------------------------------------------------

::: image/src/RasterImage.cpp
@@ +2953,5 @@
> +  nsresult rv = DecodeSomeOfImage(aImg, DECODE_TYPE_UNTIL_SIZE);
> +
> +  // This assertion must be non-fatal. DECODE_TYPE_UNTIL_SIZE makes an effort
> +  // to find the image's size, but makes no guarantees.
> +  NS_ASSERTION(aImg->mHasSize, "Did not find image's size after SyncDecodeUntilHaveSize.");

I claim it should not exist since we cannot guarantee anything. A debug-only printout could work if you're looking to catch bugs.

Alternately, abort if false on NS_FAILED(rv) || aImg->mHasSize.

@@ +3016,5 @@
> +      return rv;
> +    }
> +
> +    // If we're an UNTIL_SIZE decode and we got the image's size, we're done
> +    // here.

Explicitly document this as an early exit optimization.

@@ +3021,5 @@
> +    if (aDecodeType == DECODE_TYPE_UNTIL_SIZE && aImg->mHasSize)
> +      break;
> +
> +    // Figure out if we still have more data.
> +    haveMoreData = aImg->mSourceData.Length() > aImg->mBytesDecoded;

You could remove haveMoreData altogether, and just inline this into the while() check. Though that *would* be a change in behaviour, so it might be best to do as a separate bug with its own regression range and try run.

::: image/src/RasterImage.h
@@ +453,5 @@
> +     *
> +     * @return NS_ERROR if an error is encountered, and NS_OK otherwise.  (Note
> +     *         that we return NS_OK even when the size was not found.)
> +     */
> +    nsresult SyncDecodeUntilHaveSize(RasterImage* aImg);

SyncDecodeForSize()?
SyncDecodeUntilSizeAvailable()?
Comment 122 Justin Lebar (not reading bugmail) 2012-01-26 12:47:40 PST
> A debug-only printout could work if you're looking to catch bugs.

I was hoping to put in something that Jesse could catch with his fuzzer.  But tests will fail if we don't have the size, so this is probably ok with no warning.

> Explicitly document this as an early exit optimization.

You mean in RasterImage.h, right?

> You could remove haveMoreData altogether, and just inline this into the while() check. Though that 
> *would* be a change in behaviour, so it might be best to do as a separate bug with its own 
> regression range and try run.

I'll be gung-ho about it.  DecodeSomeData won't do anything if !haveMoreData, so I think we're probably OK.  Maybe.  :)
Comment 123 Joe Drew (not getting mail) 2012-01-26 13:24:08 PST
(In reply to Justin Lebar [:jlebar] from comment #122)
> > Explicitly document this as an early exit optimization.
> 
> You mean in RasterImage.h, right?

No, right above the code. Just trying to keep special cases to a minimum, and justify them where they aren't!
Comment 124 Justin Lebar (not reading bugmail) 2012-01-26 13:27:16 PST
   1.624 +    // If we're an UNTIL_SIZE decode and we got the image's size, we're done
   1.625 +    // here.
   1.626 +    if (aDecodeType == DECODE_TYPE_UNTIL_SIZE && aImg->mHasSize)
   1.627 +      break;

You want this comment to say "Early-exit optimization: If we're an UNTIL_SIZE decode ..."?  Or you want a comment just above the loop (one's already there)?
Comment 125 Joe Drew (not getting mail) 2012-01-26 13:29:07 PST
The former.
Comment 126 Justin Lebar (not reading bugmail) 2012-01-26 13:37:30 PST
How about I fix this as part of bug 721510?
Comment 127 Joe Drew (not getting mail) 2012-01-26 14:08:36 PST
Sure! Whatever makes most sense.
Comment 130 Justin Lebar (not reading bugmail) 2012-01-31 09:31:25 PST
*** Bug 671641 has been marked as a duplicate of this bug. ***
Comment 131 Justin Lebar (not reading bugmail) 2012-02-01 10:54:34 PST
I think this needs to be backed out of Aurora and nightly, due to:

Bug 723053 (possibly security-sensitive crash)
Bug 721917 (intermittent reftest orange)

Do I need to get approval for the backout?
Comment 132 Dão Gottwald [:dao] 2012-02-01 11:00:41 PST
(In reply to Justin Lebar [:jlebar] from comment #131)
> Do I need to get approval for the backout?

yep
Comment 133 Justin Lebar (not reading bugmail) 2012-02-01 11:11:26 PST
Created attachment 593527 [details]
Back out this and bug 721510

[Approval Request Comment]
Back out patch and dependent change due to regressions (see comment 131).  No string changes, reverting after just a few days to known-good state, so low-risk.
Comment 134 Jim Jeffery not reading bug-mail 1/2/11 2012-02-01 11:41:25 PST
Why is this a security issue for Aurora, and not the rest of the Nightly Users Audience ?
Comment 135 Justin Lebar (not reading bugmail) 2012-02-01 11:57:30 PST
It is a (potential) security issue for everyone.  That's why I said "I think this needs to be backed out of Aurora *and nightly*".

I need approval to back out of Aurora, and I asked Joe for a review on the backout from Nightly.
Comment 136 Joe Drew (not getting mail) 2012-02-01 13:37:07 PST
Comment on attachment 593527 [details]
Back out this and bug 721510

sure
Comment 137 Justin Lebar (not reading bugmail) 2012-02-01 16:43:32 PST
I may have a fix in hand for all our problems in bug 723053 (sorry, for those of you without access).  So I'm going to test it out and wait to see whether we need to back out on nightly.

Release drivers: We may still want to back out in Aurora, given how many things were broken with this patch.  But check out the patch in bug 723053; it's simply adding a strong reference.
Comment 138 Justin Lebar (not reading bugmail) 2012-02-01 20:54:59 PST
Unfortunately the fix for the crasher in bug 723053 does not fix the random oranges.
Comment 139 Alex Keybl [:akeybl] 2012-02-02 07:09:23 PST
Comment on attachment 593527 [details]
Back out this and bug 721510

[Triage Comment]
Approved for Aurora 12 to get us back to a known working state.
Comment 140 Justin Lebar (not reading bugmail) 2012-02-02 08:00:55 PST
Landed backout on inbound: https://hg.mozilla.org/integration/mozilla-inbound/rev/350ba395c507
Comment 141 Ed Morley [:emorley] 2012-02-03 11:16:08 PST
(In reply to Justin Lebar [:jlebar] from comment #140)
> Landed backout on inbound:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/350ba395c507

https://hg.mozilla.org/mozilla-central/rev/350ba395c507
Comment 142 Justin Lebar (not reading bugmail) 2012-02-04 09:50:54 PST
Backed out on Aurora: https://hg.mozilla.org/releases/mozilla-aurora/rev/63278b4cbe87
Comment 143 Justin Lebar (not reading bugmail) 2012-02-05 20:47:00 PST
New try push: https://tbpl.mozilla.org/?tree=Try&rev=2650d56d540c
Comment 144 Justin Lebar (not reading bugmail) 2012-02-06 09:16:07 PST
It looks like my try push may fix all of the intermittent reftest failures except

  image/test/reftest/bmp/bmp-corrupted/wrapper.html?invalid-compression.bmp | image comparison (==)
Comment 145 Justin Lebar (not reading bugmail) 2012-02-06 20:17:07 PST
iinteresting.  My suspicion is that a call to FlushInvalidations is causing this failure.  So I put printfs before the calls to FlushInvalidations in RasterImage.  Sure enough, I saw this in a failing test:

XXX RasterImage::DecodeWorker::Run calling FlushInvalidations. mURI=file:///Users/cltbld/talos-slave/test/build/reftest/tests/image/test/reftest/bmp/bmp-corrupted/invalid-compression-RLE4.bmp, mSourceData.Length()=246, mBytesDecoded=246, mDecoded=0, mDecoder=0x112d49300
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/image/test/reftest/bmp/bmp-corrupted/wrapper.html?invalid-compression-RLE4.bmp | image comparison (==)

The FlushInvalidations call is missing in non-failing runs.
Comment 146 Justin Lebar (not reading bugmail) 2012-02-08 11:14:25 PST
Oh...I bet I know what the problem is.  When we do an UNTIL_SIZE decode, we should not flush invalidations.

https://tbpl.mozilla.org/?tree=Try&rev=7426ebe91a3f
Comment 147 Justin Lebar (not reading bugmail) 2012-02-08 22:11:40 PST
Created attachment 595656 [details] [diff] [review]
Part 2: Fix crash, randomorange
Comment 148 Joe Drew (not getting mail) 2012-02-09 07:55:04 PST
Comment on attachment 595656 [details] [diff] [review]
Part 2: Fix crash, randomorange

Review of attachment 595656 [details] [diff] [review]:
-----------------------------------------------------------------

::: image/src/RasterImage.cpp
@@ +3020,5 @@
>    }
>  
> +  // Flush invalidations after we've decoded all the chunks we're going to.
> +  // But if this was an UNTIL_SIZE decode, don't flush invalidations; wait for
> +  // a real decode before doing that.

Slight rewording:

For non-size decodes, flush invalidations after we've decoded all the chunks we're going to. Size decodes shouldn't cause paints.

@@ +3023,5 @@
> +  // But if this was an UNTIL_SIZE decode, don't flush invalidations; wait for
> +  // a real decode before doing that.
> +  //
> +  // Similarly, don't flush invalidations if we have all the source data; this
> +  // disables progressive display of previously-discarded images (which lets us

This might need a rewording since we no longer have the parallelism above.
Comment 149 Justin Lebar (not reading bugmail) 2012-02-09 13:22:35 PST
Fingers crossed: https://hg.mozilla.org/integration/mozilla-inbound/rev/581a0260b08b
Comment 150 Ed Morley [:emorley] 2012-02-10 05:27:13 PST
https://hg.mozilla.org/mozilla-central/rev/581a0260b08b

Note You need to log in before you can comment on or make changes to this bug.