Last Comment Bug 677050 - UI lock when HTML parser is used from a http-on-start-request observer triggered by an object load from innerHTML
: UI lock when HTML parser is used from a http-on-start-request observer trigge...
Status: VERIFIED FIXED
[qa!]
: verified-beta
Product: Core
Classification: Components
Component: HTML: Parser (show other bugs)
: Trunk
: x86_64 Windows XP
: -- critical with 2 votes (vote)
: mozilla10
Assigned To: Henri Sivonen (:hsivonen) (Not doing reviews or reading bugmail until 2016-08-01)
:
Mentors:
http://evil.hackademix.net/bugs/parse...
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-08-06 16:54 PDT by Giorgio Maone [:mao]
Modified: 2011-12-28 13:58 PST (History)
12 users (show)
hsivonen: in‑testsuite-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
verified
verified


Attachments
Return an error from fragement parsing when re-entrant fragment parsing is attempted (17.62 KB, patch)
2011-09-04 07:26 PDT, Henri Sivonen (:hsivonen) (Not doing reviews or reading bugmail until 2016-08-01)
no flags Details | Diff | Splinter Review
Return an error from fragement parsing when re-entrant fragment parsing is attempted, without basic C++ FAIL (17.73 KB, patch)
2011-09-05 07:24 PDT, Henri Sivonen (:hsivonen) (Not doing reviews or reading bugmail until 2016-08-01)
bzbarsky: review+
Details | Diff | Splinter Review
Test case that doesn't really test the right thing (3.31 KB, patch)
2011-09-05 07:25 PDT, Henri Sivonen (:hsivonen) (Not doing reviews or reading bugmail until 2016-08-01)
no flags Details | Diff | Splinter Review
Stack trace (46.01 KB, text/plain)
2011-09-21 08:33 PDT, Henri Sivonen (:hsivonen) (Not doing reviews or reading bugmail until 2016-08-01)
no flags Details
Patch as landed with AutoRestore for reference (17.58 KB, patch)
2011-10-14 03:47 PDT, Henri Sivonen (:hsivonen) (Not doing reviews or reading bugmail until 2016-08-01)
jst: approval‑mozilla‑aurora+
jst: approval‑mozilla‑beta+
Details | Diff | Splinter Review

Description Giorgio Maone [:mao] 2011-08-06 16:54:45 PDT
If createContextualFragment() or innerHTML are used from a http-on-start-request observer (even on a document created on the fly, with no docShell) triggered by a OBJECT element inserted with innerHTML, every element in chrome and content gets locked (event processing seems suspended everywhere, including the scrollbars), but the native window decoration (e.g. the window close button) still stays responsive and the CPU stays idle, hinting at a breakage on the Gecko-side only.

This is reproducible always at http://evil.hackademix.net/bugs/parser_breaks_ui.html

Unfortunately this is also consistently reproducible on http://news.google.com if the selected region is U.S. and user has NoScript installed, and probably everywhere else a Flash player is dynamically created and uses <OBJECT> instead of <EMBED>.
Comment 1 Giorgio Maone [:mao] 2011-08-06 17:39:09 PDT
Notice: NoScript 2.1.2.6rc6 and above now have a work-around for the <OBJECT> case (spawning an internal redirect, thus deferring the processing to a later event when the main parsing job has been completed). However this is suboptimal and <OBJECT> in innerHTML may not be the only trigger of what actually looks like a reentrancy problem of the parser, even though across different unrelated documents.
Comment 2 Boris Zbarsky [:bz] 2011-08-06 19:25:02 PDT
That's really odd.  Something to do with the parser caching?

We really need to move to starting loads async.  :(
Comment 3 Henri Sivonen (:hsivonen) (Not doing reviews or reading bugmail until 2016-08-01) 2011-09-01 01:22:33 PDT
Fragment parsing isn't re-entrant, because fragment parsing isn't supposed to be able to run scripts synchronously and fragment parsing runs to completion without letting the event loop spin, so there's not supposed to be a way to call fragment parsing re-entrantly. Previously, fragment parsing wasn't re-entrant on a per-document basis. Now (since about a month ago, IIRC) it isn't re-entrant on a per-process basis.

When I made the current design, I had no idea that fragment parsing could trigger synchronous event handlers in extensions during fragment parsing. That's really, really bad.

Since NoScript already has a workaround, I'm inclined to add a re-entrancy check to fragment parsing to make fragment parsing fail if there's an active fragment parsing call on the stack. Would that be OK?
Comment 4 Giorgio Maone [:mao] 2011-09-01 01:54:35 PDT
(In reply to Henri Sivonen (:hsivonen) from comment #3)

> Since NoScript already has a workaround, I'm inclined to add a re-entrancy
> check to fragment parsing to make fragment parsing fail if there's an active
> fragment parsing call on the stack. Would that be OK?

That would be OK as long a the root problem gets solved as well, i.e. no load is started synchronously during fragment parsing, triggering various "load started" notifications (progress listeners are likely affected clients too).

If this re-entrancy check produces some meaningful verbose output on the error console it may serve as a tool to find other possible instances of this behavior and *fix* them by making the triggered event asynchronous (and if it's a load, still vetoable).
Comment 5 Boris Zbarsky [:bz] 2011-09-02 13:02:05 PDT
> That's really, really bad.

Yes.  We have some plans for making that not happen anymore, because it causes all sorts of issues.
Comment 6 Henri Sivonen (:hsivonen) (Not doing reviews or reading bugmail until 2016-08-01) 2011-09-04 07:26:18 PDT
Created attachment 558147 [details] [diff] [review]
Return an error from fragement parsing when re-entrant fragment parsing is attempted

(In reply to Giorgio Maone from comment #4)
> That would be OK as long a the root problem gets solved as well, i.e. no
> load is started synchronously during fragment parsing, triggering various
> "load started" notifications (progress listeners are likely affected clients
> too).

bz, is it realistic to make load starting async for the Firefox 8 train?

> If this re-entrancy check produces some meaningful verbose output on the
> error console it may serve as a tool to find other possible instances of
> this behavior

Can't do that. Firefox 8 is now string-frozen.

I'm attaching a patch that adds the re-entrancy check. AFAICT, none of our test cases cause re-entry.

bz, should I pursue writing a mochitest assuming that we aren't going to make the stream start async for Firefox 8 or should I expect us to be able to fix this more fully (in which case it's not worthwhile to write a test case that assumes sync stream starting).
Comment 7 Boris Zbarsky [:bz] 2011-09-04 07:30:17 PDT
> bz, is it realistic to make load starting async for the Firefox 8 train?

No.  I don't even think it's realistic for 9, unless someone starts working on it full-time today.  If I'm doing it, it _might_ be doable for 10 if I drop enough other stuff.

I estimate it to be several weeks of full-time work to do it and then fix the regressions...
Comment 8 Giorgio Maone [:mao] 2011-09-04 07:33:40 PDT
(In reply to Henri Sivonen (:hsivonen) from comment #6)
> Can't do that. Firefox 8 is now string-frozen.

Are really all the (error/warning/info) messages logged to the Error Console (which is not even supposed to be exposed to users by default anymore) localized?!
Comment 9 Henri Sivonen (:hsivonen) (Not doing reviews or reading bugmail until 2016-08-01) 2011-09-05 07:24:18 PDT
Created attachment 558278 [details] [diff] [review]
Return an error from fragement parsing when re-entrant fragment parsing is attempted, without basic C++ FAIL

Oops. My previous patch was full of basic C++ fail. :-(
Comment 10 Henri Sivonen (:hsivonen) (Not doing reviews or reading bugmail until 2016-08-01) 2011-09-05 07:25:27 PDT
Created attachment 558280 [details] [diff] [review]
Test case that doesn't really test the right thing

Why doesn't this test case actually manage to provoke the re-entrancy?
Comment 11 Henri Sivonen (:hsivonen) (Not doing reviews or reading bugmail until 2016-08-01) 2011-09-05 07:27:29 PDT
(In reply to Giorgio Maone from comment #8)
> Are really all the (error/warning/info) messages logged to the Error Console
> (which is not even supposed to be exposed to users by default anymore)
> localized?!

They are localized, yes.
Comment 12 Boris Zbarsky [:bz] 2011-09-07 18:53:58 PDT
> Why doesn't this test case actually manage to provoke the re-entrancy?

Which request are you getting the on-modify-request for?  Or are you not getting it at all?
Comment 13 Henri Sivonen (:hsivonen) (Not doing reviews or reading bugmail until 2016-08-01) 2011-09-19 03:57:48 PDT
(In reply to Boris Zbarsky (:bz) from comment #12)
> > Why doesn't this test case actually manage to provoke the re-entrancy?
> 
> Which request are you getting the on-modify-request for?

I get on-modify-request for http://example.com/ (as intended).
Comment 14 Boris Zbarsky [:bz] 2011-09-19 07:35:17 PDT
OK.  When you get it, what does the C callstack look like?
Comment 15 Henri Sivonen (:hsivonen) (Not doing reviews or reading bugmail until 2016-08-01) 2011-09-20 00:05:53 PDT
(In reply to Boris Zbarsky (:bz) from comment #14)
> OK.  When you get it, what does the C callstack look like?

I'm unable to tell. Even if I turn JITs off, the stack walker on Linux x86_64 is unable to walk past SharedStub, which seems to appear on the call stack even if I put the stack dumping behind an XPConnected API.
Comment 16 Boris Zbarsky [:bz] 2011-09-20 09:21:14 PDT
When you say "stack walker", do you mean ours, or gdb's, or both?
Comment 17 Henri Sivonen (:hsivonen) (Not doing reviews or reading bugmail until 2016-08-01) 2011-09-20 11:30:31 PDT
(In reply to Boris Zbarsky (:bz) from comment #16)
> When you say "stack walker", do you mean ours, or gdb's, or both?

I mean what NS_ASSERTION uses.
Comment 18 Boris Zbarsky [:bz] 2011-09-20 11:57:33 PDT
OK, can you attach gdb instead and see how it does?  Possibly including stepping out far enough to get out of the JS code...
Comment 19 Henri Sivonen (:hsivonen) (Not doing reviews or reading bugmail until 2016-08-01) 2011-09-21 08:33:20 PDT
Created attachment 561483 [details]
Stack trace

It looks like in the innerHTML case, a script blocker prevents actual re-entry.
Comment 20 Boris Zbarsky [:bz] 2011-09-21 11:03:29 PDT
Then I wonder what comment 0 was about....
Comment 21 Henri Sivonen (:hsivonen) (Not doing reviews or reading bugmail until 2016-08-01) 2011-09-22 05:02:18 PDT
I've spent so much time trying to reproduce the problem locally that I think it doesn't make sense to use more time on capturing the problem in a test case.

Do we want to land my fix anyway (without a test) or should this be WONTFIXed?
Comment 22 Boris Zbarsky [:bz] 2011-09-22 06:32:03 PDT
What's the user-visible effect of the patch?  Will attempts to reenter throw, or just silently do nothing?
Comment 23 Henri Sivonen (:hsivonen) (Not doing reviews or reading bugmail until 2016-08-01) 2011-09-22 06:39:01 PDT
(In reply to Boris Zbarsky (:bz) from comment #22)
> What's the user-visible effect of the patch?  Will attempts to reenter
> throw, or just silently do nothing?

It will throw. (Assuming there actually still is a way to trigger re-entry!)
Comment 24 Boris Zbarsky [:bz] 2011-09-22 06:46:26 PDT
OK.  I think landing that is fine, assuming consumers can deal...
Comment 25 Henri Sivonen (:hsivonen) (Not doing reviews or reading bugmail until 2016-08-01) 2011-09-23 00:42:54 PDT
Comment on attachment 558278 [details] [diff] [review]
Return an error from fragement parsing when re-entrant fragment parsing is attempted, without basic C++ FAIL

(In reply to Boris Zbarsky (:bz) from comment #24)
> assuming consumers can deal...

I'm guessing that it's less bad the throw an exception into extension JS code that to have browser C++ code run into an unexpected state.
Comment 26 Boris Zbarsky [:bz] 2011-10-11 12:38:16 PDT
Comment on attachment 558278 [details] [diff] [review]
Return an error from fragement parsing when re-entrant fragment parsing is attempted, without basic C++ FAIL

This looks fine, but was there a reason to not use AutoRestore<bool> instead of making up your own guard class?
Comment 27 Henri Sivonen (:hsivonen) (Not doing reviews or reading bugmail until 2016-08-01) 2011-10-14 03:42:30 PDT
(In reply to Boris Zbarsky (:bz) from comment #26)
> This looks fine, but was there a reason to not use AutoRestore<bool> instead
> of making up your own guard class?

The reason was that I was unaware of AutoRestore.

Landed with AutoRestore:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2244089600ff
Comment 28 Henri Sivonen (:hsivonen) (Not doing reviews or reading bugmail until 2016-08-01) 2011-10-14 03:47:44 PDT
Created attachment 567046 [details] [diff] [review]
Patch as landed with AutoRestore for reference

Requesting approval for aurora and beta in order to avoid shipping two releases where extensions can confuse the app by invoking fragment parsing re-entrantly. The patch is pretty simple, so I believe this to be low-risk. I also believe that having fragment parsing throw an exception to extension code is a lot less bad than the Gecko internals getting into a broken state.
Comment 30 Boris Zbarsky [:bz] 2011-10-14 09:40:03 PDT
Do you actually need the friend declaration for AutoRestore?  I doubt it; you're passing it a reference to the bool to munge, so it doesn't have to refer to it by name and hence shouldn't need to be a friend.
Comment 31 christian 2011-10-18 09:40:15 PDT
(In reply to Henri Sivonen (:hsivonen) from comment #28)
> Created attachment 567046 [details] [diff] [review] [diff] [details] [review]
> Patch as landed with AutoRestore for reference
> 
> Requesting approval for aurora and beta in order to avoid shipping two
> releases where extensions can confuse the app by invoking fragment parsing
> re-entrantly. The patch is pretty simple, so I believe this to be low-risk.
> I also believe that having fragment parsing throw an exception to extension
> code is a lot less bad than the Gecko internals getting into a broken state.

What do you mean by two releases? That this is going to be a regression we are shipping in Fx8 or it shipped in Fx7? If so, what bug fix caused this?
Comment 32 Henri Sivonen (:hsivonen) (Not doing reviews or reading bugmail until 2016-08-01) 2011-10-19 04:35:02 PDT
(In reply to Boris Zbarsky (:bz) from comment #30)
> Do you actually need the friend declaration for AutoRestore?  I doubt it;
> you're passing it a reference to the bool to munge, so it doesn't have to
> refer to it by name and hence shouldn't need to be a friend.

You are right. The "friend" part is an unnecessary left-over from the version of the patch that had a one-off boolean flipping guard. Filed bug 695648. Sorry.

(In reply to Christian Legnitto [:LegNeato] from comment #31)
> (In reply to Henri Sivonen (:hsivonen) from comment #28)
> > Created attachment 567046 [details] [diff] [review] [diff] [details] [review] [diff] [details] [review]
> > Patch as landed with AutoRestore for reference
> > 
> > Requesting approval for aurora and beta in order to avoid shipping two
> > releases where extensions can confuse the app by invoking fragment parsing
> > re-entrantly. The patch is pretty simple, so I believe this to be low-risk.
> > I also believe that having fragment parsing throw an exception to extension
> > code is a lot less bad than the Gecko internals getting into a broken state.
> 
> What do you mean by two releases? That this is going to be a regression we
> are shipping in Fx8 or it shipped in Fx7? If so, what bug fix caused this?

The two releases I mean are Firefox 8 and Firefox 9. Bug 596182 introduced the new restriction that extensions must not invoke fragment parsing re-entrantly. Previously, the restriction was that it was not safe to invoke fragment parsing re-entrantly on one document, but extensions could get away with invoking fragment parsing on another document while one document's fragment parser was on the call stack. (Web content can't and Firefox-internal code doesn't invoke fragment parsing re-entrantly.)

As I understand it, the latest NoScript has already worked around the new restriction, so we don't need to land this for 8 & 9 to cater to NoScript. The motivation for having this patch in Firefox at all is defending against unknown other extensions that might use the same pattern that NoScript used. Now that NoScript has been fixed, such extensions might not exist. OTOH, if such extensions do exist, having the problem mitigation mechanism only in Firefox 10 is a bit late when the risk is already in Firefox 8.

The fix for bug 596182 is good for memory footprint, which is why I think it should not be backed out. Also, at this point, adding this patch here is way lower risk than undoing the patch for bug 596182.
Comment 33 Johnny Stenback (:jst, jst@mozilla.com) 2011-10-20 14:14:37 PDT
Comment on attachment 567046 [details] [diff] [review]
Patch as landed with AutoRestore for reference

Approved for aurora and beta. It seems safer to take this pretty straight forward fix than it is to back out the huge patch that caused this, and it also seems safer to take this fix than it does to not do anything.
Comment 34 Henri Sivonen (:hsivonen) (Not doing reviews or reading bugmail until 2016-08-01) 2011-10-21 09:25:47 PDT
Thanks.

https://hg.mozilla.org/releases/mozilla-beta/rev/235ede1d26ef
https://hg.mozilla.org/releases/mozilla-aurora/rev/672f3bb231e3
Comment 35 Virgil Dicu [:virgil] [QA] 2011-12-12 08:10:49 PST
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:9.0) Gecko/20100101 Firefox/9.0
Mozilla/5.0 (X11; Linux x86_64; rv:9.0) Gecko/20100101 Firefox/9.0 
Mozilla/5.0 (Windows NT 6.1; rv:9.0) Gecko/20100101 Firefox/9.0
Mozilla/5.0 (Windows NT 5.2; WOW64; rv:9.0) Gecko/20100101 Firefox/9.0

Verified on Firefox 9 Beta 5, on Ubuntu 11.10, Mac OS 10.6, XP and Windows 7 using the STR from comment 0.

1. Install No Script 2.1.2.3.
2. Load http://evil.hackademix.net/bugs/parser_breaks_ui.html
3. Allow  hackademix.net
4. Reload page.

The UI worked as normal after following these steps.
Comment 36 Virgil Dicu [:virgil] [QA] 2011-12-28 05:52:10 PST
Mozilla/5.0 (Windows NT 5.1; rv:10.0) Gecko/20100101 Firefox/10.0
Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20100101 Firefox/10.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0) Gecko/20100101 Firefox/10.0
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0) Gecko/20100101 Firefox/10.0

Verified on Firefox 10 Beta1. No problems occurred with the steps from comment 35. Setting resolution to Verified Fixed.

Note You need to log in before you can comment on or make changes to this bug.