Closed Bug 1209124 Opened 9 years ago Closed 9 years ago

Startup crash in v41 nsStyleSet::GatherRuleProcessors(nsStyleSet::sheetType) possibly related to Yandex toolbar *and* Adblock Plus

Categories

(Core :: CSS Parsing and Computation, defect)

41 Branch
x86
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla44
Tracking Status
firefox42 --- fixed
firefox43 --- fixed
firefox44 --- fixed
relnote-firefox --- 41+

People

(Reporter: away, Assigned: heycam)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-1a47824e-7d37-494e-bb22-c92b92150923.
=============================================================

This is #4 on the topcrash scores on 41.0 mostly due to being a startup crash. It is highly correlated with "ru" locale setting. My initial instinct is that it's probably due to Yandex toolbar, but I can't confirm right now since correlation files are currently broken.

0 	xul.dll 	nsStyleSet::GatherRuleProcessors(nsStyleSet::sheetType) 	layout/style/nsStyleSet.cpp
1 	xul.dll 	nsStyleSet::EndUpdate() 	layout/style/nsStyleSet.cpp
2 	xul.dll 	PresShell::AddUserSheet(nsISupports*) 	layout/base/nsPresShell.cpp
3 	xul.dll 	PresShell::Observe(nsISupports*, char const*, wchar_t const*) 	layout/base/nsPresShell.cpp
4 	xul.dll 	nsObserverList::NotifyObservers(nsISupports*, char const*, wchar_t const*) 	xpcom/ds/nsObserverList.cpp
5 	xul.dll 	nsObserverService::NotifyObservers(nsISupports*, char const*, wchar_t const*) 	xpcom/ds/nsObserverService.cpp
6 	xul.dll 	nsStyleSheetService::LoadAndRegisterSheet(nsIURI*, unsigned int) 	layout/base/nsStyleSheetService.cpp
7 	xul.dll 	NS_InvokeByIndex 	xpcom/reflect/xptcall/md/win32/xptcinvoke.cpp
8 	xul.dll 	XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) 	js/xpconnect/src/XPCWrappedNative.cpp
9 	xul.dll 	XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) 	js/xpconnect/src/XPCWrappedNativeJSOps.cpp
10 	xul.dll 	js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp
On each of the aurora, beta, and release channels, this definitely looks to have been introduced on the 41 train (insufficient data for nightly).
http://hg.mozilla.org/releases/mozilla-release/annotate/78c82e5cd777/layout/style/nsStyleSet.cpp#l526

PresContext() crashes because mRuleTree is null.

dbaron - is this something that a toolbar would be able to cause? Has anything changed along this codepath in version 41?
Flags: needinfo?(dbaron)
I spot-checked a dozen reports and they all have in common these extensions:
 	yasearch@yandex.ru
 	{d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d} (Adblock Plus)
So mRuleTree being null means that nsStyleSet::Shutdown got called, I believe.  Or, I guess mRuleTree::Init never got called...

This is called from PresShell::Destroy.  The ordering there is mStyleSet->BeginShutdown(), some other cleanup, mStyleSet->Shutdown().

The stack above shows JS code from Adblock Plus, presumably, running and adding a user sheet, which sends an observer notification, which the presshell is responding to.  Presshell registers for these observer notifications in PresShell::Init and removes itself from observing them in PresShell::Destroy, before shutting down the styleset.

Which is to say I'm not quite sure how we can end up with a presshell that has a shutdown style set but is still getting the observer notification.  :(

That said, there are some bandaids we could apply here.

1)  nsStyleSet::GatherRuleProcessors could bail out if mInShutdown.  This should be
    enough to stop the crash if we actually had nsStyleSet::Shutdown called on us.
2)  We could check for mIsDestroying in PresShell::Observe and bail out if so.  That would
    fix things if our presshell is somehow receiving observer notifications even though
    it's RemoveObserver'd itself.

Does the "yasearch@yandex.ru" addon include binary blobs?  If not, it might be interesting to see what it's actually doing to cause this...
(In reply to David Major [:dmajor] from comment #2)
> http://hg.mozilla.org/releases/mozilla-release/annotate/78c82e5cd777/layout/
> style/nsStyleSet.cpp#l526
> 
> PresContext() crashes because mRuleTree is null.
> 
> dbaron - is this something that a toolbar would be able to cause? Has
> anything changed along this codepath in version 41?

The big thing that changed in 41 in this area is bug 77999.  I don't have any ideas immediately, but maybe heycam does.
Flags: needinfo?(dbaron) → needinfo?(cam)
Summary: Startup crash in v41 nsStyleSet::GatherRuleProcessors(nsStyleSet::sheetType) possibly related to Yandex toolbar → Startup crash in v41 nsStyleSet::GatherRuleProcessors(nsStyleSet::sheetType) possibly related to Yandex toolbar *and* Adblock Plus
Note that the crashing code was entirely added in bug 77999.  Until then, this function didn't use to call PresContext().
The correlations files are working again: (I removed the 0% lines)

    100% (84/84) vs.   3% (558/19077) yasearch@yandex.ru (Yandex.Bar, https://addons.mozilla.org/addon/3495)
          1% (1/84) vs.   0% (1/19077) 8.13.0
          6% (5/84) vs.   0% (17/19077) 8.13.1
         76% (64/84) vs.   2% (425/19077) 8.13.5.2
          4% (3/84) vs.   0% (5/19077) 8.8.1
         13% (11/84) vs.   0% (40/19077) 8.9.2

     90% (76/84) vs.   3% (609/19077) vb@yandex.ru
          4% (3/84) vs.   0% (14/19077) 2.17.0
         12% (10/84) vs.   0% (48/19077) 2.18.0
         73% (61/84) vs.   2% (445/19077) 2.22.1
          2% (2/84) vs.   0% (33/19077) 2.22.5.2

     98% (82/84) vs.  20% (3872/19077) {d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d} (Adblock Plus, https://addons.mozilla.org/addon/1865)
          2% (2/84) vs.   1% (98/19077) 2.6.10
         95% (80/84) vs.  20% (3734/19077) 2.6.11
(In reply to Boris Zbarsky [:bz] from comment #6)
> Note that the crashing code was entirely added in bug 77999.  Until then,
> this function didn't use to call PresContext().

Yeah, it looks like GatherRuleProcessors could have happily done its work even when mRuleTree is null, before bug 77999.

If we're in here after style set shutdown, then bailing out seems like a reasonable band-aid.  If we haven't called nsStyleSet::Init yet then bailing out early is probably OK -- we'll surely end up calling GatherRuleProcessors soon anyway.

At the top of the stack is a Promise callback, which makes me think that it's more likely we're being being called after style set shutdown.

There are plenty of NS_ENSURE_FALSE(mInShutdown, ...) calls at the top of methods in nsStyleSet, incidentally.
I tried installing Yandex.Bar 8.13.5.2 and ABP 2.6.11 but couldn't reproduce the crash.
Both Yandex.Bar and ABP call loadAndRegisterSheet(..., USER_SHEET), and Yandex.Bar uses promises a bit, although I couldn't find an obvious .then(... loadAndRegisterSheet(..., USER_SHEET) ...) call.

I lean towards adding an NS_ENSURE_FALSE(mInShutdown, NS_ERROR_FAILURE) at the top of GatherRuleProcessors.
Flags: needinfo?(cam)
(In reply to Cameron McCormack (:heycam) (away Oct 2) from comment #10)
> I lean towards adding an NS_ENSURE_FALSE(mInShutdown, NS_ERROR_FAILURE) at
> the top of GatherRuleProcessors.

The only thing returning NS_ERROR_FAILURE here would affect is making nsStyleSet::EndUpdate break out of its loop.  Nothing seems to check nsStyleSet::EndUpdate's return value.
Sounds reasonable.  I don't quite see how this is happening, but it's worth seeing if it makes the crash go away on aurora and beta.
(In reply to Cameron McCormack (:heycam) (away Oct 2) from comment #8)
> At the top of the stack is a Promise callback, which makes me think that
> it's more likely we're being being called after style set shutdown.

It's also less likely we're called before nsStyleSet::Init, since we call nsStyleSet::Init just before we set nsPresShell::mStyleSet.
(In reply to Boris Zbarsky [:bz] from comment #4)
> 2)  We could check for mIsDestroying in PresShell::Observe and bail out if
> so.  That would
>     fix things if our presshell is somehow receiving observer notifications
> even though
>     it's RemoveObserver'd itself.

I think this is reasonable to do as well.  Let's do it.

As an aside, in PresShell::Observe we check for mStyleSet not being null -- i.e. checking we're not called before the pres shell has had its Init called -- although we do set mStyleSet before we register the observer so I don't think we'll ever have mStyleSet being null in there.
Attached patch patchSplinter Review
https://treeherder.mozilla.org/#/jobs?repo=try&revision=72f23da8beec
Assignee: nobody → cam
Status: NEW → ASSIGNED
Attachment #8667071 - Flags: review?(dbaron)
https://hg.mozilla.org/mozilla-central/rev/d22d047dd978
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
dbaron, to save some time, would you mind filling out the uplift on behalf of heycam?
Flags: needinfo?(dbaron)
Comment on attachment 8667071 [details] [diff] [review]
patch

Approval Request Comment
[Feature/regressing bug #]: bug 77999
[User impact if declined]: probably continue crashing with combination of yandex toolbar and adblock plus
[Describe test coverage new/current, TreeHerder]: none (can't reproduce crash, either)
[Risks and why]: low risk; skips some work when document already partly destroyed
[String/UUID change made/needed]: no
Flags: needinfo?(dbaron)
Attachment #8667071 - Flags: approval-mozilla-beta?
Attachment #8667071 - Flags: approval-mozilla-aurora?
David, do you believe this patch is safe to be uplifted to mozilla-release for inclusion in 41.0.1 release? If yes, could you please nominate for uplift to m-r? DMajor brought it to my attention again today and suggested we take it in 41.
Flags: needinfo?(dbaron)
If we're going to take it in 41.0.1, I might want to take only the nsStyleSet.cpp half of the patch there -- at least depending on the schedule for 41.0.1.  (It would probably be good to get confirmation that the fix works as well, but sounds like there's not time for that.)
Flags: needinfo?(dbaron)
Comment on attachment 8667071 [details] [diff] [review]
patch

Approval Request Comment:  see comment 19

For release, I'd like to land the nsStyleSet.cpp patch only.
Attachment #8667071 - Flags: approval-mozilla-release?
Curious if you agree if that's a reasonable strategy.
Flags: needinfo?(bzbarsky)
That seems fine, yes.
Flags: needinfo?(bzbarsky)
Comment on attachment 8667071 [details] [diff] [review]
patch

Approved for aurora uplift.
Comment on attachment 8667071 [details] [diff] [review]
patch

Given that AdBlock plus is one of our top addons, fixing crashes in that area meets the 41 dot release criteria bar. Approving for uplift to m-r.
Attachment #8667071 - Flags: approval-mozilla-release? → approval-mozilla-release+
Comment on attachment 8667071 [details] [diff] [review]
patch

Turning the right flag on, please see Liz's comment 25.
Attachment #8667071 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
We tried to reproduce this initial crash on various OS`s (Windows 10 32-bit, Windows 7 64-bit, Mac OS X 10.10.5 and Ubuntu 14.04 32-bit) with Firefox 41 RC (affected) using Yandex toolbar and Adblock Plus but with no luck. 

Also did not reproduce with latest Firefox 41.0.1 build 2 on the same OS`s. 
I'm not going to mark the flag as verified until Socorro does not show any more crashes on latest builds.
Comment on attachment 8667071 [details] [diff] [review]
patch

Taking it to beta too.
Attachment #8667071 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Added to the release notes with "Fix a startup crash related to Yandex toolbar and Adblock Plus (1209124)" as wording.

I've spent a few hours chasing this bug down and unfortunately the classifications here in the bug tracker made it an extremely difficult ordeal.

  • This bug is (only?) triggered via recursive observer notifications pertaining to document stylesheets, which can occur when a stylesheet is loaded in an idiosyncratic fashion, almost certainly only if the observation is triggered by mutation of a <style>...</style> or <link rel="stylesheet" ... /> element by an asynchronously executed script,
  • The crash isn't the actual bug here, it's brought on by an infinite loop resulting in 100% CPU and continuous memory ballooning, probably leading to an allocation failure that ultimately results in mRuleTree being null,
  • This bug may be surfaced in the Yandex Toolbar or the AdBlock Plus code but has nothing to do with them (a class of AdBlock rules work by injecting style rules into the document to hide ad elements, triggered on page request (based off the URL) independent of the content (i.e. asynchronously prior to the load event),
  • This bug is not fixed in Firefox 41. The partial patch that was backported to Firefox 41 was insufficient to address the issue, and this bug reproduces in Firefox 41.0.2,
  • This bug was not introduced in Firefox 41; I can reproduce it going back to at least Firefox 25 (although perhaps the symptoms caught and reported by the telemetrics differ based off of what happens after the memory explodes),
  • For the record, this bug last reproduces in Firefox 42.0 beta 2 and is resolved in Firefox 42.0 beta 3 and above.

The status of this bug being incorrectly marked as fixed in Firefox 41 meant that it did not appear in any bugzilla queries for bugs affecting Firefox 41 and fixed in Firefox 42, so I needed to manually bisect archived release builds then track down the relevant changesets (only(GECKO420b3_2015100117_RELBRANCH, GECKO420b2_2015092813_RELBRANCH), for the record).

The bug is easily reproducible with the following code:

function loadCss(url: string) {
    return new Promise((resolve, reject) => {
        const link = document.createElement("link");
        link.rel = "stylesheet";
        link.type = "text/css";
        link.href = url;

        // We know that this document contains no styles currently in use,
        // so allow this to be loaded asynchronously without delaying page
        // load by giving it a `media` other than `screen`:
        link.media = "notscreen";
        link.addEventListener("load", function () {
            // Trigger actually loading the stylesheet:
            link.media = "screen";
            resolve();
        });
        link.addEventListener("error", reject);

        head.appendChild(link);
    });
}

loadCss("path/to/file.css");

started asynchronously on document request (transpiled as needed depending on the version of Firefox you intend to target).

In case it's not obvious what causes the recursive observer notification, it's the link.media = "screen"; line being unconditionally executed. The change of link.media from screen to screen (on second+ invocation) is (falsely) detected as a change and results in this crash. The following code is not affected by this issue:

function loadCss(url: string) {
    return new Promise((resolve, reject) => {
        const link = document.createElement("link");
        link.rel = "stylesheet";
        link.type = "text/css";
        link.href = url;

        // We know that this document contains no styles currently in use,
        // so allow this to be loaded asynchronously without delaying page
        // load by giving it a `media` other than `screen`:
        link.media = "notscreen";
        link.addEventListener("load", function () {
            // Trigger actually loading the stylesheet:
            if (link.media !== "screen") {
                link.media = "screen";
            }
            resolve();
        });
        link.addEventListener("error", reject);

        head.appendChild(link);
    });
}

loadCss("path/to/file.css");

To the best of my knowledge all other contemporaries of the affected Firefox versions handle this correctly (including versions of IE and Chrome from the same era), and I'm unsure whether or the issue with recursive invocations of the notification observer still persists in current Firefox (with the fallout handled per the changesets targeting this bug).

(Of course all this could be a separate bug, but the stack trace in the error reports indicates it's unlikely to be the case.)

Comment 34 seems to be describing bug 1203973.

That is not the same as this bug.

probably leading to an allocation failure that ultimately results in mRuleTree being null

Allocation failure for mRuleTree resulted in an immediate process abort going back to at least Firefox 18 (see bug 809533). So no, this is not what happened.

But most importantly, this bug is about a crash during startup. The code in comment 34 is not terribly likely to be running at startup...

Bug 1203973 is marked as being fixed in Firefox 18, so either it was reintroduced between then and Firefox 25 or it's not the same issue.

I'm not sure why the code in comment 34 can't have been used by the Yandex toolbar and the AdBlock Plus plugin at startup (although I confess not knowing at which point in startup this crash was happening, which does make a difference). If it's at startup prior to requesting any pages (even a local start tab or start page); as I understand it, XUL can use stylesheets for skinning, and who knows what the Yandex plugin loaded (in the background?) at startup.

I was not implying that the allocation of mRuleTree itself was erroring, but rather that an unhandled (or incorrectly handled) allocation failure in the code leading to its initialization/assignment perhaps resulted in that state. But I am nowhere near being familiar enough with the codebase to definitively state that's the case.

btw, the code in comment 34 can likely (untested) be replicated via simply the following:

<link rel="stylesheet" type="text/css" media="notscreen" href="path/to/file.css" onload="this.media='screen'" />

which is a lot less arcane and webapp-y.

Bug 1203973 is marked as being fixed in Firefox 18

No, it's marked fixed in Firefox 42, which matches your conclusions about when your testcase was fixed. Did you accidentally look at the resolution information for bug 809533 instead?

I'm not sure why the code in comment 34 can't have been used by the Yandex toolbar and the AdBlock Plus plugin at startup

It could have, but then the stack would show the recursion, and it's not showing that.

but rather that an unhandled (or incorrectly handled) allocation failure in the code

It's possible in theory, but for what it's worth allocation failure is generally immediately fatal in Gecko and has been for a long time (dating back to before Firefox 20).

No, it's marked fixed in Firefox 42, which matches your conclusions about when your testcase was fixed. Did you accidentally look at the resolution information for bug 809533 instead?

You're right, that's exactly what happened.

Mea culpa, I stand corrected.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: