Closed Bug 779581 Opened 12 years ago Closed 8 years ago

Unrecoverable browser hang (seemingly in the network stack/image loader)

Categories

(Core :: Networking, defect)

16 Branch
x86_64
Windows 7
defect
Not set
major

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: kael, Unassigned)

References

()

Details

(Keywords: regression)

I accidentally opened up an old test document from an app that I worked on that used an embedded HTML renderer to display UI. For some reason, this document hangs Firefox solid without any crash/hang recovery possible.

I can't tell precisely what is going wrong, but if I look at it in the debugger, it seems to be trapped in an infinite loop attempting to load resources from a custom URI scheme that the embedded renderer exposed (editor://) - under normal circumstances i would just expect this to fail and not display a resource, since the loads are being triggered by <img> tags. Instead, it seems to display a few broken images and then completely choke and never recover.

I don't necessarily think that this page should *work* in FF, but opening it should definitely not make the browser choke and hang. Chrome handles it just fine and it used to work in FF back when I was working on it.
I tested in a local trunk build with a clean profile, yes.
it repros easily with a windows nightly. hard for me to get a stack right now cause i'm in constant meetings but the ui thread is obviously spinning at 100%
When I paused a few times in a local build in the debugger, it seemed to be stuck in nsImageLoadingContent::LoadImage, on a call to StringToURI. The actual URI in question seemed to bounce between two different URIs (both with the editor:// prefix), so probably two different images.
It's a regression. Disabling HWA doesn't help.

Mozregression range:
m-c
good=2010-11-16
bad=2010-11-17
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a42e9b001bc8&tochange=ad227939db82
Keywords: regression
Maybe this suspected bug:
Ehsan Akhgari — Bug 612565 - Don't recurse into document modification listeners for editable documents if the modification to the document is made on the bogus BR node; r=roc a=blocking-beta8+ This fixes an unresponsive script dialog error for Etherpad pages (possibly among other websites).
Is anybody actively diagnosing this?  I could take a look...
I think the key part is this, in doneLoading:

  var doneLoading = function (evt) {
    tiles.appendChild(tile);

this will kick off a new load for the tile off the append, and when that load completes the append will happen again, which will do a remove and readd and start a _new_ load, etc.

And each time through will kick off a loadTile on the next index to boot.

We have some protection against unnecessary loads, but that relies on being able to create an nsIURI from the src, which may be what's failing here.

I have no idea why the regression range in comment 6 matters here.
The number of loads should be bounded though - iirc it's set to 200 - which would suggest that it should recover from the hang eventually.

Or are you saying that doneLoading has a bug because - for example - the load event can be fired multiple times, or the error event can be fired multiple times, or both events can be fired? I'll admit I don't think I ever considered those possibilities when writing this originally.
(In reply to Boris Zbarsky (:bz) [In and out Aug 1 - 10, out Aug 11-20] from comment #9)
> I have no idea why the regression range in comment 6 matters here.

Before 2010-11-16: no hang, Firefox is able to display the page with the HTML editor
http://i.imgur.com/HvyKU.png

After 2010-11-17: UI hangs, almost impossible to navigate in Firefox.
> Or are you saying that doneLoading has a bug

Yes.  Its error/load handler moves the image in the DOM, but moving the image starts a new load which calls error/load, etc.  So yes, they can fire multiple times on a given image, if you move it around, at least in Gecko right now.

> Before 2010-11-16: no hang

I get that.  What I don't see is what exactly in that range would have changed the behavior or why it used to work before.
Thanks Boris. Can you point me to the relevant part of the relevant spec(s) that explains what circumstances would cause multiple load/error events? This probably affects other applications I've written and it's probably something other developers I know have been doing too.

What approach would avoid triggering this? Would display: none not trigger a new load because it only affects layout and display, not DOM structure?
The spec situation here is ... unclear.  Moving a node in the DOM can change its base URI (due to xml:base).  So what we do is we start a load every time a node is moved in the DOM which the HTML spec says nothing about, but the HTML spec also doesn't consider xml:base.  Then we optimize the load away if the URI is the same as the URI of the last load and the last load didn't get blocked by content policy (this last check is because in general content policies depend on position in the DOM!).

In this case the URI is editor://something, which is an unknown URI scheme.  Unknown protocols are flagged as URI_DOES_NOT_RETURN_DATA, and nsNoDataProtocolContentPolicy blocks such loads for images so we don't end up optimizing away loads on movement in the DOM in this case.  But all of that has been in place since 2007 or so.
Please let me know if I can help with this bug somehow.  The stuff that is being discussed here is not very familiar to me, FWIW!
unfortunately the repro URL is gone
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.