Closed Bug 507902 Opened 15 years ago Closed 15 years ago

nsImageFrame static Icon Loads should not use the mListener of the first instantiated nsImageFrame

Categories

(Core :: Layout: Images, Video, and HTML Frames, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9.3a1

People

(Reporter: bholley, Assigned: bholley)

References

Details

Attachments

(3 files, 10 obsolete files)

In nsImageFrame, the first initialized nsImageFrame instantiates the static nsImageFrame::gIconLoad, which contains 2 image requests, one for the "loading" icon and one for the "broken" icon.

http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsImageFrame.cpp#248

However, in LoadIcons, the nsImageListener (helper class) belonging to the "this" instance of nsImageFrame is handed to the static icon loader as a parameter:
http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsImageFrame.cpp#1691

Ironically, this parameter never actually gets used (another bug), because the actual call to LoadImage happens in a nsImageFrame method, not an IconLoad method. Since the locals are called 'mListener' for both classes, this went unnoticed.

http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsImageFrame.cpp#1631

The problem here is that the imgRequest (well, technically imgRequestProxy) is stored in a static variable, so it outlives the nsImageFrame and associated listener that kicks things off. However, the imgRequestProxy still has a pointer to the memory that used to hold the class. imgRequestProxy holds a raw pointer to the listener, and assumes that pointer is valid until Cancel() is called. At that point, mCanceled is set, and the pointer is assumed to be no longer valid (this seems really hacky to me - why can't we just force callers to support weak references?). An example of checking mCanceled is here:
http://mxr.mozilla.org/mozilla-central/source/modules/libpr0n/src/imgRequestProxy.cpp#445

On trunk, this issue is harmless. imgRequestProxy actually holds a reference to the listener until the first OnStopRequest, so things are safe up until that point:
http://mxr.mozilla.org/mozilla-central/source/modules/libpr0n/src/imgRequestProxy.h#148

On trunk, imgRequestProxy is pretty much quiet after OnStopRequest, the big exception being multipart/x-mixed-replace. Since the icons are PNGs, this doesn't happen. So even though the stale pointer sits around in imgRequestProxy, nothing bad comes of it.

With my work in bug 435296, things are a different story. Under that patch, re-decodes after discard fire another set of notifications. When the OnStartDecode hits the imgRequestProxy held by the static gIconLoad in layout, it finds that the request has not been canceled (the static icon code, of course, does not want to cancel the request), and so it tries to fire an event off to the listener. Annoyingly, this only shows up in use cases where we load the icon once, stop drawing it, wait 15 seconds, and try to draw it again. I thankfully managed to run across this reliably by running the content/event mochitests.

Patch coming soon.
Blocks: 435296
Also, there seems to be a general confusion about whether mIconsLoaded is a boolean or an integer:
http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsImageFrame.cpp#1718
http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsImageFrame.cpp#1733
http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsImageFrame.cpp#1046

yuck.

In general we should be querying GetImageStatus to determine if the icons are loaded, so I'll do that in this patch. The big question here is what we really mean by "loaded". In a decode-on-draw and discard world, loaded and decoded are 2 different things. My guess is that we want to only draw this when it's loaded and not to discard it, which would mean that we should query FRAME_COMPLETE rather than LOAD_COMPLETE (currently they're basically the same, but won't be after bug 435296 lands). However, this runs us into trouble if we're discarding or decoding on draw, because FRAME_COMPLETE might be false and stay false until we draw. We could fix this by forcing synchronous decode, but for the icons, we probably don't want to be discarding or decoding-on-draw.

This is a good argument in favor of modifying the LoadImage signature to accept flags which can disable d-o-d and discarding.
Do we have any test coverage for loading icons and broken image icons? If not, the loading (at least) might be a good use for the new mochitest functionality added in bug 396226.
Attached patch patch v1 (obsolete) — Splinter Review
attached a patch. pushed to try.
> this seems really hacky to me - why can't we just force callers to support weak 
> references?

Because in some cases the imgRequestProxy is the only thing keeping the listener alive.

I agree it's really hacky; the fact that imagelib forces callers to hold a strong ref to the imgRequestProxy to do anything useful with the image, combined with the fact that some image loads never stop and that the imgRequestProxy is what holds a ref to the caller means that a weak ref is needed _somewhere_.  Where it is now was the sanest place to put it at the time this leak cycle was fixed to not leak; maybe there's a better place now?  File bugs as needed!

> Do we have any test coverage for loading icons and broken image icons?

Not that I know of.
Attached patch patch v2 (obsolete) — Splinter Review
Updated patch. This one checks the status flags for SIZE_AVAILABLE instead of just checking for the container pointer, since the assumptions there will change with bug 435296.
Attachment #392158 - Attachment is obsolete: true
Attached patch tests (obsolete) — Splinter Review
So after I discovered that the behavior of our "loading" and "broken" icons was untested and underspecified, I spent quite a bit of time putting together a big test for the behavior. It's definitely overkill, but I can reuse the same framework to write tests for things like bug 475108, so I think it was worth it. Nonetheless, we now have pretty rockin' tests for our icon load behavior. Woo.

Flagging joe for review, because I know how much he loves tests. ;-)
Attachment #392341 - Flags: review?(joe)
Comment on attachment 392341 [details] [diff] [review]
tests

My concern here is that we have timing-sensitive stuff running on VMs that are often overloaded, which could lead to sporadic orange. Is there any way to test the same codepath but without setting up potentially brittle timers? (e.g., do file_iconTestServer.sjs?continue=true or whatever)
Attached patch tests v2 (obsolete) — Splinter Review
Addressed joe's review comments. I hadn't originally thought this was possible, but I discovered the getState/setState methods available to sjs handlers and it rocked my world.

This gets rid of the need for the timed interaction between the client and server, and speeds things up a lot. I left a 200ms timeout for between "request data from server" and "draw what is hopefully the loading icon" because I want to, if possible, also test that the loading icon remains even when we send the first 10 bytes of the png (ie, before we get to the part with the size and flag SIZE_AVAILABLE). I think this is ok, because even if something goes flaky with the timers the test will still pass (it just won't be testing absolutely everything it could be).

This version also adds the nice feature of making it possible to refresh the test and have it still pass by doing some clever clipping.

reflagging joe for review.
Attachment #392341 - Attachment is obsolete: true
Attachment #392469 - Flags: review?(joe)
Attachment #392341 - Flags: review?(joe)
Attached patch patch v3 (obsolete) — Splinter Review
updated patch to fix a missing null check introduced with patch v2. Pushing to try.
Attachment #392338 - Attachment is obsolete: true
also, ignore what I said about the locals being named the same thing. I was confused.
Attachment #392470 - Flags: review?(joe)
Attachment #392470 - Flags: review?(bzbarsky)
Comment on attachment 392470 [details] [diff] [review]
patch v3

try server all green with both patch and tests (except for rando-orange on winnt unit, which shows up on the previous run as well). Flagging bz and joe for review.
Some questions on this patch:

1)  In the new world, how do we handle updating the icon for a broken image if
    the icons were not loaded yet, or not fully loaded, when we started
    displaying the alt feedback for that image?

2)  In the new world, how are you handling animated icons?

3)  Why are you removing the NOISY_* defines?

4)  It's not clear to me why checking STATUS_FRAME_COMPLETE is the right thing
    in DisplayAltFeedback.  It doesn't help that the STATUS_* flags in
    imgIRequest are completely undocumented.  Please document them, esp. if
    we've recently changed what they mean (which it sounds like we have).

5)  It's not clear to me why checking STATUS_SIZE_AVAILABLE is the right thing
    to do in BuildDisplayList.  See item 4.

6)  Is the imgIContainer actually used in BuildDisplayList after this patch?
    If not, why are we getting it?

7)  I see nothing in the imgILoader documentation that allows passing a null
    imgIDecoderObserver to loadImage().  In practice, this seems to be
    supported; can you please update the documentation accordingly?
Attached patch patch v4 (obsolete) — Splinter Review
updated patch
Attachment #392470 - Attachment is obsolete: true
Attachment #392470 - Flags: review?(joe)
Attachment #392470 - Flags: review?(bzbarsky)
Attachment #392801 - Flags: review?(joe)
Attachment #392801 - Flags: review?(bzbarsky)
Comment on attachment 392801 [details] [diff] [review]
patch v4

reflagging for review.

1) Good point - I missed that subtlety. I started working up a patch to handle it, but soon realized that it was more than doubling the patch size and adding a ton of code for not much benefit.

Here's what handling this properly involves:
* Passing a non-null observer to loadImage
* We then need to make something that can act as an observer. This involves a lot of boilerplate code
* The observer then needs a way to notify interested nsImageFrames about the load of the icon. Thus, such frames have to register themselves with this observer, and an array of listeners has to be maintained
* Listeners can go away, so nsImageFrame would either need to support weak references or have conditional code in the destructor to remove itself from the listener list

This is a fair amount of code complication to handle one case unlikely. Given that the icons are loaded from the jar and less than 300 bytes, it's safe to say that they will be loaded "very soon" after the instantiation of the first nsImageFrame, which is also unlikely to be broken. Furthermore, I just made a build with some printf instrumentation, and it seems like the icon request always loads (OnStopRequest is called) before a display list is built for the first nsImageFrame. So emperically we're pretty safe. In the worst case, the code still does draws "some graffiti" by hand, so it's not like we're looking at major breakage or anything. As such, I advocate not handling this case.

2) I'm not handling animated icons. We only have 2 icons, they're both hardcoded into nsImageFrame.cpp, and neither is animated. It wasn't handled in the old world either, so I don't think it's worth worrying about.

3) the NOISY_IMAGE_LOADING define wasn't actually used anywhere. NOISY_ICON_LOADING was used in two places, one in the HandleIconLoad function (which I removed), and the other in LoadIcons(). Leaving it in would allow us to switch a #define and get a printf when the first nsImageFrame was instantiated (and thus the icon loads were kicked off). Icon loads aren't something a lot of people are going to be debugging, and anyone who wants to can stick their own printf in LoadIcons just as easily as they can switch the #define. So I got rid of it for cleanliness (I'd actually forgotten to get rid of the second use - did that in this updated patch).

4) Documentation added. Given that the icons are 16x16 and loaded from the filesystem, I don't think it's worth the hassle of adding the ability to progressively load them (also, I think it would be bad UI design). So FRAME_COMPLETE tells us that the image is actually ready for drawing. FRAME_COMPLETE hasn't changed meaning, and still won't under decode-on-draw. It's just that on trunk FRAME_COMPLETE and LOAD_COMPLETE get set at more or less exactly the same time for single-frame images, whereas they won't necessarily for decode-on-draw.

5) Ditto on the docs. I chose SIZE_AVAILABLE because it means we've started to get at least a trickle of image data in (probably the first 20 bytes or so, enough to figure out width/height). This seemed like a good time to stop displaying the loading icon given the lack of an AT_LEAST_ONE_PIXEL_AVAILABLE flag. I did this instead of the previous check for a non-null container because I don't want the semantics of when containers are instantiated to affect layout in subtle ways.

6) Yep. If the image is in good shape (so we don't need to paint altfeedback), we do:
rv = aLists.Content()->AppendNewToTop(new (aBuilder) nsDisplayImage(this, imgCon));

7) Sure.
> In the worst case, the code still does draws "some graffiti" by hand

Yes, and we had bugs about that, which is why the code is as it is, as I recall...  Please do check the blame?

> We only have 2 icons, they're both hardcoded into nsImageFrame.cpp, and neither
> is animated.

The icons are not built into the binary, are they?  Anyone embedding Gecko can drop in any icons desired.

> It wasn't handled in the old world either

That's a much more useful argument; I thought we used to handle it at some point, but maybe someone broke it already.  Or maybe I just misremember.

> Given that the icons are 16x16 and loaded from the filesystem

That's not necessarily the case in all embeddings, right?  I'm ok with the change in general, modulo item 1.

> I don't want the semantics of when containers are instantiated to affect
> layout in subtle ways.

Sounds good; can we document those semantics?
Note that I might be ok with not fixing item #1 above, depending, but do check the blame.  Do we have unit tests that could start failing due to that change, for example?
Attached patch patch v5 (obsolete) — Splinter Review
So after talking with bz and dbaron on IRC I decided that I was wrong about the "not-yet-loaded-broken-icon" case being too small to worry about. Browsing with images disabled, for example, can easily trigger this. I ran some tests and, at least on my machine, the broken image always loads between nsFrame init and the time the display list is built. However, I still manged to trigger the "red dot of death" by replacing the icon with an absurdly large jpeg, which an embedder could in theory do. Thus, the patch is tested.

Thus, this patch handles that case. It meant sticking in a lot of boilerplate decode listener methods, but that list should be reduced considerably when I fix bug 505385.

As an added bonus, the listening machinery meant it was easy to add animation support as well. So we now support animated images (tested, and tested that we didn't before).
Attachment #392801 - Attachment is obsolete: true
Attachment #392801 - Flags: review?(joe)
Attachment #392801 - Flags: review?(bzbarsky)
Attachment #393194 - Flags: review?(joe)
Attachment #393194 - Flags: review?(bzbarsky)
Comment on attachment 393194 [details] [diff] [review]
patch v5

reflagging for review.
Attachment #393194 - Flags: review?(bzbarsky) → review+
Comment on attachment 393194 [details] [diff] [review]
patch v5

>+        mIconObservers.AppendElementUnlessExists(frame);

Can it ever exist?  If not, can we do a much faster Append(), with maybe an assert that it doesn't exist?

Other than that, looks good.
Attached patch patch v6 (obsolete) — Splinter Review
updated patch to address bz's review comments.

Leaving bz's r+ on the above and reflagging joe on the new patch. repushed to try as well.
Attachment #393434 - Flags: review?(joe)
Attachment #393194 - Flags: review?(joe)
Attached patch tests v3 (obsolete) — Splinter Review
fixed a typo in the tests and added some useful debugging information. This test actually caught a regression introduced by bug 435296, so it's already earned its pay.

reflagging joe for review.

By the way joe, I'd appreciate if you could comment on my js style/usage in general, since I'm not really experienced with the language (or the DOM) at all.
Attachment #392469 - Attachment is obsolete: true
Attachment #393734 - Flags: review?(joe)
Attachment #392469 - Flags: review?(joe)
Attached patch patch v7 (obsolete) — Splinter Review
Updated patch. Changes are the following:

1) I didn't know that frames can be re-initialized. We now fix the flag on shutdown.

2) Fixed some other places where we were equating the existence of the imgContainer with having a size (see my comments about this above).

3) Got rid of some boilerplate checks of things that don't fail

4) Fixed some indenting

This is enough IMO to warrant another look from bz. Canceling the old review and reflagging.
Attachment #393194 - Attachment is obsolete: true
Attachment #393434 - Attachment is obsolete: true
Attachment #393841 - Flags: review?(joe)
Attachment #393841 - Flags: review?(bzbarsky)
Attachment #393434 - Flags: review?(joe)
Attachment #393734 - Flags: review?(joe) → review+
Comment on attachment 393734 [details] [diff] [review]
tests v3

>+  // Figure out whether the client wants to load the image, or just
>+  // to tell us that it's ready for us to finish
>+  var query = {};
>+  request.queryString.split('&').forEach(function (val) {
>+    var [name, value] = val.split('=');
>+    query[name] = unescape(value);
>+  });
>+  if (query["continue"] == "true") {
>+    setState("doContinue", "yes");
>+    response.finish();
>+    return;
>+  }

I don't think we have any location objects in SJS, sadly, but if we did, I've been told that we could use location.query.

>+  // After the first load, we remember the size of the last
>+  // successful image and keeps the frame that size, even through
>+  // a refresh (not sure if this is desired behavior). This means
>+  // that if you refresh the test, the "loading" icon for the test
>+  // image will appear with a border that stretches further right
>+  // and down, causing the verify stage to fail. To allow for
>+  // successful test refreshes, we add a clipping region so that we
>+  // see the left and top borders, along with the image, but not the
>+  // bottom and right borders.
>+  if (i < 4) {
>+    var ctx = can.getContext("2d");
>+    ctx.beginPath();
>+    ctx.rect(0,0, 30, 30);
>+    ctx.clip();
>+   }

Can you explain more about why this is necessary? Who remembers the size of the last successful image? And why?

>+  { "fn" : startServerLoad,          "trans" : 300 },

Is there a need for a timeout here?

>+//
>+// Enables and disables bordering/padding to mimic the look of alt feedback icons
>+//
>+function enableBorderAndPad() {
>+  divContainer.style.border = "1px";
>+  divContainer.style.borderStyle = "inset";
>+  testImageElem.style.padding = "3px";
>+}
>+
>+function disableBorderAndPad() {
>+  testImageElem.style.padding = 0;
>+  divContainer.style.border = "0px";
>+  divContainer.style.borderStyle = "";
>+}

I'm sad that we have to sort of hack up alt feedback icon support here, but I guess we haven't got any choice since it's hardcoded in the cpp file.
Attachment #393841 - Flags: review?(joe) → review+
It shouldn't be necessary to use timeouts as you do.  Just do an XHR to that path to act as a notification to resume processing again, and use setObjectState and getObjectState to stash away a continuation function to call.
> 1) I didn't know that frames can be re-initialized.

Uh.. say what?

The rest of the changes look fine.
(In reply to comment #25)
> > 1) I didn't know that frames can be re-initialized.
> 
> Uh.. say what?
> 
> The rest of the changes look fine.

When I was debugging this I set a breakpoint for a certain image url in ::Init, inspected the 'this' pointer, then broke at ::Destroy conditioned on 'this' being the same. I continued to destroy, say it execute, then removed that breakpoint and continued. I would then hit the breakpoint for a second time in ::Init (based on the url), and the 'this' pointer would be exactly the same as the first time. The backtrace was something like RecreateSomething, so I assumed that the frame was being re-initialized for some reason.

If this shouldn't happen, then it's possible that there was some memory pooling going on and the memory was being re-used (with manual destruction and placement new), and it was just a coincidence that I always saw 2 frames for 1 url using the exact same memory space.

If hypothesis #2 is correct, the line setting mDisplayingIcon is unnecessary (because the constructor is called again) but harmless. I'd prefer to leave it in just to avoid churn, but I can remove it too if you prefer. Let me know.

The last run through the tryserver for this patch was all green except for the following 2 orange pieces on linux:

27561 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_load.html | [SimpleTest/SimpleTest.js, window.onerror] An error occurred: [ Script error. ]
27564 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_load_candidates.html | [SimpleTest/SimpleTest.js, window.onerror] An error occurred: [ Script error. ]

I haven't seen those errors in previous runs of the same patch and they look very much unrelated, so unless anyone objects I'm considering them to be random and, assuming I get bz's r+, considering this ready for checkin.

Should I wait to check this in until after the branch?
Attached patch tests v4Splinter Review
Updated tests to address joe's review comments.

Jeff - Are you sure you're looking at the more recent tests? After joe's comments last week I did away with timeouts in all but one place (and that place is now documented).
Attachment #393734 - Attachment is obsolete: true
> If this shouldn't happen, then it's possible that there was some memory pooling
> going on and the memory was being re-used (with manual destruction and
> placement new)

Frames are allocated out of an arena with per-size freelists, so this is exactly what would happen in the scenario you describe.

> I'd prefer to leave it in just to avoid churn

I'd prefer to take it out, since it's unnecessary work.  Doesn't need extra review from me after taking it out, though.
Attachment #393841 - Flags: review?(bzbarsky) → review+
Comment on attachment 393841 [details] [diff] [review]
patch v7

r=bzbarsky with the extra set in Destroy removed.
Attached patch patch v8Splinter Review
Updated patch with the extra set remove.
Attachment #393841 - Attachment is obsolete: true
pushed to mc in 707f5ac27ac8 and 45a9af25197e

Resolving fixed.

Also, comment #30 should say "extra set removed".
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
/tests/layout/generic/test/test_bug507902.html seems to be failing intermittently.
See bug 510001
Attachment 394007 [details] [diff] most certainly appears to have timers in it to periodically check |getState("doContinue")|; maybe that's the one location to which you refer.  However, there's no reason that timer has to exist, and it can be removed as sketched out in comment 24.

Also, while we're on the topic, for a timer to work correctly you have to keep a reference to it.  Once |waitForContinueAndFinish| returns nothing keeps an explicit reference to that timer, so if you get unlucky and a GC happens just so the XPConnect references could all go away, the timer could die, and you'll be left hanging -- perhaps even resulting in a test timeout, conceivably.  No idea how often this would happen in practice, but it's certainly possible.
> for a timer to work correctly you have to keep a reference to it.

You do?  Why?  The timer thread holds a reference to armed timers, last I checked; I see no provisions inside timerimpl for removing that reference if the refcount goes to 1.
Huh.  Not sure how I missed that!
Attached patch orange fix v1Splinter Review
orange fix. flagging jwalden for review.

Jeff - sorry I didn't understand your comment the first time. I thought you were referring to the client side script.
Attachment #394131 - Flags: review?(jwalden+bmo)
Comment on attachment 394131 [details] [diff] [review]
orange fix v1

Looks good to me, but please file a bug on the remaining 300ms timeout you have for testing when the loading... image should display.  Even if it's not exposed to web content, which seems likely, there should be a way to be notified when the loading image is shown.
Attachment #394131 - Flags: review?(jwalden+bmo) → review+
pushed fix to mc in 9bcf62d5073b
Depends on: 513544
Target Milestone: --- → mozilla1.9.3a1
Depends on: 542096
Depends on: 686003
Product: Core → Core Graveyard
Product: Core Graveyard → Core
You need to log in before you can comment on or make changes to this bug.