Closed Bug 677050 Opened 13 years ago Closed 13 years ago

UI lock when HTML parser is used from a http-on-start-request observer triggered by an object load from innerHTML

Categories

(Core :: DOM: HTML Parser, defect)

x86_64
Windows XP
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla10
Tracking Status
firefox8 --- verified
firefox9 --- verified

People

(Reporter: ma1, Assigned: hsivonen)

References

()

Details

(Keywords: verified-beta, Whiteboard: [qa!])

Attachments

(4 files, 1 obsolete file)

If createContextualFragment() or innerHTML are used from a http-on-start-request observer (even on a document created on the fly, with no docShell) triggered by a OBJECT element inserted with innerHTML, every element in chrome and content gets locked (event processing seems suspended everywhere, including the scrollbars), but the native window decoration (e.g. the window close button) still stays responsive and the CPU stays idle, hinting at a breakage on the Gecko-side only.

This is reproducible always at http://evil.hackademix.net/bugs/parser_breaks_ui.html

Unfortunately this is also consistently reproducible on http://news.google.com if the selected region is U.S. and user has NoScript installed, and probably everywhere else a Flash player is dynamically created and uses <OBJECT> instead of <EMBED>.
Notice: NoScript 2.1.2.6rc6 and above now have a work-around for the <OBJECT> case (spawning an internal redirect, thus deferring the processing to a later event when the main parsing job has been completed). However this is suboptimal and <OBJECT> in innerHTML may not be the only trigger of what actually looks like a reentrancy problem of the parser, even though across different unrelated documents.
That's really odd.  Something to do with the parser caching?

We really need to move to starting loads async.  :(
Fragment parsing isn't re-entrant, because fragment parsing isn't supposed to be able to run scripts synchronously and fragment parsing runs to completion without letting the event loop spin, so there's not supposed to be a way to call fragment parsing re-entrantly. Previously, fragment parsing wasn't re-entrant on a per-document basis. Now (since about a month ago, IIRC) it isn't re-entrant on a per-process basis.

When I made the current design, I had no idea that fragment parsing could trigger synchronous event handlers in extensions during fragment parsing. That's really, really bad.

Since NoScript already has a workaround, I'm inclined to add a re-entrancy check to fragment parsing to make fragment parsing fail if there's an active fragment parsing call on the stack. Would that be OK?
(In reply to Henri Sivonen (:hsivonen) from comment #3)

> Since NoScript already has a workaround, I'm inclined to add a re-entrancy
> check to fragment parsing to make fragment parsing fail if there's an active
> fragment parsing call on the stack. Would that be OK?

That would be OK as long a the root problem gets solved as well, i.e. no load is started synchronously during fragment parsing, triggering various "load started" notifications (progress listeners are likely affected clients too).

If this re-entrancy check produces some meaningful verbose output on the error console it may serve as a tool to find other possible instances of this behavior and *fix* them by making the triggered event asynchronous (and if it's a load, still vetoable).
> That's really, really bad.

Yes.  We have some plans for making that not happen anymore, because it causes all sorts of issues.
(In reply to Giorgio Maone from comment #4)
> That would be OK as long a the root problem gets solved as well, i.e. no
> load is started synchronously during fragment parsing, triggering various
> "load started" notifications (progress listeners are likely affected clients
> too).

bz, is it realistic to make load starting async for the Firefox 8 train?

> If this re-entrancy check produces some meaningful verbose output on the
> error console it may serve as a tool to find other possible instances of
> this behavior

Can't do that. Firefox 8 is now string-frozen.

I'm attaching a patch that adds the re-entrancy check. AFAICT, none of our test cases cause re-entry.

bz, should I pursue writing a mochitest assuming that we aren't going to make the stream start async for Firefox 8 or should I expect us to be able to fix this more fully (in which case it's not worthwhile to write a test case that assumes sync stream starting).
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
> bz, is it realistic to make load starting async for the Firefox 8 train?

No.  I don't even think it's realistic for 9, unless someone starts working on it full-time today.  If I'm doing it, it _might_ be doable for 10 if I drop enough other stuff.

I estimate it to be several weeks of full-time work to do it and then fix the regressions...
(In reply to Henri Sivonen (:hsivonen) from comment #6)
> Can't do that. Firefox 8 is now string-frozen.

Are really all the (error/warning/info) messages logged to the Error Console (which is not even supposed to be exposed to users by default anymore) localized?!
Oops. My previous patch was full of basic C++ fail. :-(
Attachment #558147 - Attachment is obsolete: true
Why doesn't this test case actually manage to provoke the re-entrancy?
(In reply to Giorgio Maone from comment #8)
> Are really all the (error/warning/info) messages logged to the Error Console
> (which is not even supposed to be exposed to users by default anymore)
> localized?!

They are localized, yes.
> Why doesn't this test case actually manage to provoke the re-entrancy?

Which request are you getting the on-modify-request for?  Or are you not getting it at all?
(In reply to Boris Zbarsky (:bz) from comment #12)
> > Why doesn't this test case actually manage to provoke the re-entrancy?
> 
> Which request are you getting the on-modify-request for?

I get on-modify-request for http://example.com/ (as intended).
OK.  When you get it, what does the C callstack look like?
(In reply to Boris Zbarsky (:bz) from comment #14)
> OK.  When you get it, what does the C callstack look like?

I'm unable to tell. Even if I turn JITs off, the stack walker on Linux x86_64 is unable to walk past SharedStub, which seems to appear on the call stack even if I put the stack dumping behind an XPConnected API.
When you say "stack walker", do you mean ours, or gdb's, or both?
(In reply to Boris Zbarsky (:bz) from comment #16)
> When you say "stack walker", do you mean ours, or gdb's, or both?

I mean what NS_ASSERTION uses.
OK, can you attach gdb instead and see how it does?  Possibly including stepping out far enough to get out of the JS code...
Attached file Stack trace
It looks like in the innerHTML case, a script blocker prevents actual re-entry.
Then I wonder what comment 0 was about....
I've spent so much time trying to reproduce the problem locally that I think it doesn't make sense to use more time on capturing the problem in a test case.

Do we want to land my fix anyway (without a test) or should this be WONTFIXed?
What's the user-visible effect of the patch?  Will attempts to reenter throw, or just silently do nothing?
(In reply to Boris Zbarsky (:bz) from comment #22)
> What's the user-visible effect of the patch?  Will attempts to reenter
> throw, or just silently do nothing?

It will throw. (Assuming there actually still is a way to trigger re-entry!)
OK.  I think landing that is fine, assuming consumers can deal...
Comment on attachment 558278 [details] [diff] [review]
Return an error from fragement parsing when re-entrant fragment parsing is attempted, without basic C++ FAIL

(In reply to Boris Zbarsky (:bz) from comment #24)
> assuming consumers can deal...

I'm guessing that it's less bad the throw an exception into extension JS code that to have browser C++ code run into an unexpected state.
Attachment #558278 - Flags: review?(bzbarsky)
Comment on attachment 558278 [details] [diff] [review]
Return an error from fragement parsing when re-entrant fragment parsing is attempted, without basic C++ FAIL

This looks fine, but was there a reason to not use AutoRestore<bool> instead of making up your own guard class?
Attachment #558278 - Flags: review?(bzbarsky) → review+
(In reply to Boris Zbarsky (:bz) from comment #26)
> This looks fine, but was there a reason to not use AutoRestore<bool> instead
> of making up your own guard class?

The reason was that I was unaware of AutoRestore.

Landed with AutoRestore:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2244089600ff
Whiteboard: [inbound]
Target Milestone: --- → mozilla10
Requesting approval for aurora and beta in order to avoid shipping two releases where extensions can confuse the app by invoking fragment parsing re-entrantly. The patch is pretty simple, so I believe this to be low-risk. I also believe that having fragment parsing throw an exception to extension code is a lot less bad than the Gecko internals getting into a broken state.
Attachment #567046 - Flags: approval-mozilla-beta?
Attachment #567046 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/2244089600ff
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Whiteboard: [inbound]
Do you actually need the friend declaration for AutoRestore?  I doubt it; you're passing it a reference to the bool to munge, so it doesn't have to refer to it by name and hence shouldn't need to be a friend.
(In reply to Henri Sivonen (:hsivonen) from comment #28)
> Created attachment 567046 [details] [diff] [review] [diff] [details] [review]
> Patch as landed with AutoRestore for reference
> 
> Requesting approval for aurora and beta in order to avoid shipping two
> releases where extensions can confuse the app by invoking fragment parsing
> re-entrantly. The patch is pretty simple, so I believe this to be low-risk.
> I also believe that having fragment parsing throw an exception to extension
> code is a lot less bad than the Gecko internals getting into a broken state.

What do you mean by two releases? That this is going to be a regression we are shipping in Fx8 or it shipped in Fx7? If so, what bug fix caused this?
(In reply to Boris Zbarsky (:bz) from comment #30)
> Do you actually need the friend declaration for AutoRestore?  I doubt it;
> you're passing it a reference to the bool to munge, so it doesn't have to
> refer to it by name and hence shouldn't need to be a friend.

You are right. The "friend" part is an unnecessary left-over from the version of the patch that had a one-off boolean flipping guard. Filed bug 695648. Sorry.

(In reply to Christian Legnitto [:LegNeato] from comment #31)
> (In reply to Henri Sivonen (:hsivonen) from comment #28)
> > Created attachment 567046 [details] [diff] [review] [diff] [details] [review] [diff] [details] [review]
> > Patch as landed with AutoRestore for reference
> > 
> > Requesting approval for aurora and beta in order to avoid shipping two
> > releases where extensions can confuse the app by invoking fragment parsing
> > re-entrantly. The patch is pretty simple, so I believe this to be low-risk.
> > I also believe that having fragment parsing throw an exception to extension
> > code is a lot less bad than the Gecko internals getting into a broken state.
> 
> What do you mean by two releases? That this is going to be a regression we
> are shipping in Fx8 or it shipped in Fx7? If so, what bug fix caused this?

The two releases I mean are Firefox 8 and Firefox 9. Bug 596182 introduced the new restriction that extensions must not invoke fragment parsing re-entrantly. Previously, the restriction was that it was not safe to invoke fragment parsing re-entrantly on one document, but extensions could get away with invoking fragment parsing on another document while one document's fragment parser was on the call stack. (Web content can't and Firefox-internal code doesn't invoke fragment parsing re-entrantly.)

As I understand it, the latest NoScript has already worked around the new restriction, so we don't need to land this for 8 & 9 to cater to NoScript. The motivation for having this patch in Firefox at all is defending against unknown other extensions that might use the same pattern that NoScript used. Now that NoScript has been fixed, such extensions might not exist. OTOH, if such extensions do exist, having the problem mitigation mechanism only in Firefox 10 is a bit late when the risk is already in Firefox 8.

The fix for bug 596182 is good for memory footprint, which is why I think it should not be backed out. Also, at this point, adding this patch here is way lower risk than undoing the patch for bug 596182.
Comment on attachment 567046 [details] [diff] [review]
Patch as landed with AutoRestore for reference

Approved for aurora and beta. It seems safer to take this pretty straight forward fix than it is to back out the huge patch that caused this, and it also seems safer to take this fix than it does to not do anything.
Attachment #567046 - Flags: approval-mozilla-beta?
Attachment #567046 - Flags: approval-mozilla-beta+
Attachment #567046 - Flags: approval-mozilla-aurora?
Attachment #567046 - Flags: approval-mozilla-aurora+
Whiteboard: [qa+]
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:9.0) Gecko/20100101 Firefox/9.0
Mozilla/5.0 (X11; Linux x86_64; rv:9.0) Gecko/20100101 Firefox/9.0 
Mozilla/5.0 (Windows NT 6.1; rv:9.0) Gecko/20100101 Firefox/9.0
Mozilla/5.0 (Windows NT 5.2; WOW64; rv:9.0) Gecko/20100101 Firefox/9.0

Verified on Firefox 9 Beta 5, on Ubuntu 11.10, Mac OS 10.6, XP and Windows 7 using the STR from comment 0.

1. Install No Script 2.1.2.3.
2. Load http://evil.hackademix.net/bugs/parser_breaks_ui.html
3. Allow  hackademix.net
4. Reload page.

The UI worked as normal after following these steps.
Keywords: verified-beta
Whiteboard: [qa+] → [qa+], [qa!:9]
Mozilla/5.0 (Windows NT 5.1; rv:10.0) Gecko/20100101 Firefox/10.0
Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20100101 Firefox/10.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0) Gecko/20100101 Firefox/10.0
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0) Gecko/20100101 Firefox/10.0

Verified on Firefox 10 Beta1. No problems occurred with the steps from comment 35. Setting resolution to Verified Fixed.
Status: RESOLVED → VERIFIED
Whiteboard: [qa+], [qa!:9] → [qa!], [qa!:9], [qa!:10]
Whiteboard: [qa!], [qa!:9], [qa!:10] → [qa!]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: