Closed
Bug 1404020
Opened 7 years ago
Closed 7 years ago
stylo: Crash in CheckCrossStyleBackendAdoption
Categories
(Core :: CSS Parsing and Computation, defect, P4)
Tracking
()
RESOLVED
WORKSFORME
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox55 | --- | unaffected |
firefox56 | --- | unaffected |
firefox57 | --- | wontfix |
firefox58 | --- | fix-optional |
firefox59 | --- | fix-optional |
People
(Reporter: calixte, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: crash, regression, Whiteboard: [clouseau])
Crash Data
This bug was filed from the Socorro interface and is
report bp-fa77c0d5-e5b1-4aa4-9527-033660170928.
=============================================================
There is 1 crash in nightly 58 with buildid 20170928100123. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1403024.
[1] https://hg.mozilla.org/mozilla-central/rev?node=b8b62d3ba3799f380c30e5b8a41066a48b49a94e
Flags: needinfo?(xidorn+moz)
Comment 1•7 years ago
|
||
Notes from the crash report:
Servo, with style data: InlineStyle
Gecko doc: system, XUL, content-type: application/vnd.mozilla.xul+xml, url: chrome://browser/content/browser.xul
Updated•7 years ago
|
Priority: -- → P3
Comment 2•7 years ago
|
||
"e10sEnabled":false :/
Someone is trying to move an element from browser.xul to content document?
Comment 3•7 years ago
|
||
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #2)
> "e10sEnabled":false :/
>
> Someone is trying to move an element from browser.xul to content document?
Yeah, presumably somebody forcing legacy addons. Probably not worth spending time on, but we should see if any crashes appear on beta.
Comment 4•7 years ago
|
||
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #3)
> Yeah, presumably somebody forcing legacy addons. Probably not worth spending
> time on, but we should see if any crashes appear on beta.
Yeah... "extensions.legacy.enabled":true
Flags: needinfo?(xidorn+moz)
Comment 5•7 years ago
|
||
There are two more crash reports show up:
https://crash-stats.mozilla.com/report/index/cfb7f8cf-ad75-4818-a322-de3300170929
Gecko -> Servo, with style data: InlineStyle
Gecko doc: content-type: text/html, url: https: (omitted)
https://crash-stats.mozilla.com/report/index/5b03673b-0db7-4fa1-b8d7-05e710170929
Servo -> Gecko, with style data: InlineStyle
Gecko doc: content-type: text/html, url: https: (omitted)
Their Gecko docs are neither system nor XUL, and are just normal https html, how can this happen...
Comment 6•7 years ago
|
||
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #3)
> Yeah, presumably somebody forcing legacy addons. Probably not worth spending
> time on, but we should see if any crashes appear on beta.
they are on 57.0b4: https://crash-stats.mozilla.com/signature/?product=Firefox&release_channel=beta&signature=CheckCrossStyleBackendAdoption&date=%3E%3D2017-09-01#reports
Comment 7•7 years ago
|
||
The more I read the crash report, the more I find things confusing... I cannot really understand why an https page could get Gecko backend without being system or XUL...
Given that we are going to proceed bug 1403077, there would be a higher risk of violating this in more situations. Probably we should avoid asserting this and simply clear style data attached? (and reparse style attribute?)
bholley, what do you think?
Flags: needinfo?(bobbyholley)
Comment 8•7 years ago
|
||
Per the meeting, I'm pretty skeptical about adding code to handle this, given that it would be poorly-tested and we don't have an understanding of why this happens. a MOZ_RELEASE_ASSERT is safer, and I think the crash volume isn't a huge risk.
That said, I think we should continue trying to figure out what's going on. One theory is that the pref is flipping, which might be the result of people changing experiment groups (so they wouldn't know they'd done it). To that end, we could record whether the pref ever changes over the lifetime of the process, and include that in the crash report annotation. Alternatively, we could simply cache the value of StyloEnabled() on the document alongside the value of mStyleBackendType, and report whether they're different in the notes.
Another good thing to check would be to see if the two principals are EqualConsideringDomain.
Flags: needinfo?(bobbyholley)
Comment 9•7 years ago
|
||
Chris points out that we can figure out of a crash report came from somebody in the 5% Gecko control group by looking for |branch: "gecko"| in the TelemetryEnvironment.
I looked at three, and all three contained that field. We should verify a few more, but that probably gives us a pretty good idea of why this is happening.
If that is the answer here, I'd imagine the crash volume will drop as the experiment population stabilizes. We could also consider reading the stylo pref at startup on beta, but that does have drawbacks.
Comment 10•7 years ago
|
||
As of now we have 129 reports[1] for 7 days, and 114 reports [2] have 'branch: gecko'. Rest of reports are on disabled e10s other than this report <bp-3baec143-3e0e-4c8b-ae70-56d530171002>. This one is e10s but does not have 'branch: gecko'.
[1] https://crash-stats.mozilla.com/search/?signature=~CheckCrossStyleBackendAdoption&date=%3E%3D2017-09-26T03%3A06%3A00.000Z&date=%3C2017-10-03T03%3A06%3A00.000Z&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature
[2] https://crash-stats.mozilla.com/search/?signature=~CheckCrossStyleBackendAdoption&telemetry_environment=~%22branch%22%3A%22gecko%22&date=%3E%3D2017-09-26T03%3A06%3A00.000Z&date=%3C2017-10-03T03%3A06%3A00.000Z&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature
Comment 11•7 years ago
|
||
For the e10s disabled cases, I wonder if this is due to (a) chrome or extensions poking at the document, or (b) something to do with the e10s-disabling pref also being set a bit later.
Comment 12•7 years ago
|
||
(In reply to Cameron McCormack (:heycam) (away 4 Oct) from comment #11)
> For the e10s disabled cases, I wonder if this is due to (a) chrome or
> extensions poking at the document, or (b) something to do with the
> e10s-disabling pref also being set a bit later.
All e10s disabled cases are "Gecko -> Servo, url: chrome://browser/content/browser.xul", so it's (a)?
Comment 13•7 years ago
|
||
Sounds like it then.
Comment 14•7 years ago
|
||
Sounds like we probably don't need to do anything at the moment, and majority of crashes are from the pref-flip experiment.
Some of the non-e10s crash reports indicate there are legacy addons, but it is not clear to me whether they are enabled. I suppose that if the legacy addon pref is not listed in telemetry enviroment, they cannot be enabled? In that case, it sounds like something in Firefox's chrome code tries to do that.
But I guess the number for that case is quite low that it's probably not worth doing anything at the moment?
It would be great if the pref study can switch the pref only at the beginning of the process rather than in the middle.
![]() |
||
Updated•7 years ago
|
Reporter | ||
Comment 16•7 years ago
|
||
The crash is spiking in the last days in 58:
- the 2017-10-11 there were 4 single installations crashing;
- the 2017-10-12 there were 5;
- the 2017-10-13 there are 19.
Reporter | ||
Comment 17•7 years ago
|
||
The crash is spiking in 57 beta 7:
- beta 7: 205 crashes
- beta 6: 37 crashes
- beta 5: 12 crashes
Comment 18•7 years ago
|
||
I first suspected that bug 1403077 can somehow involves, but it seems bug 1403077 isn't there until beta 8, so luckily that doesn't matter.
Those again happens on normal content page, and I cannot think out any reason except flipping the perf, but this time there is no "branch":"gecko" anymore. Are we finishing the experiment so that those who were in the control group now start loading pages in Stylo backend again?
Chris, do you have any idea about this?
Flags: needinfo?(cpeterson)
Comment 19•7 years ago
|
||
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #18)
> I first suspected that bug 1403077 can somehow involves, but it seems bug
> 1403077 isn't there until beta 8, so luckily that doesn't matter.
>
> Those again happens on normal content page, and I cannot think out any
> reason except flipping the perf, but this time there is no "branch":"gecko"
> anymore. Are we finishing the experiment so that those who were in the
> control group now start loading pages in Stylo backend again?
Flags: needinfo?(cpeterson) → needinfo?(mgrimes)
Comment 20•7 years ago
|
||
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #18)
> I first suspected that bug 1403077 can somehow involves, but it seems bug
> 1403077 isn't there until beta 8, so luckily that doesn't matter.
>
> Those again happens on normal content page, and I cannot think out any
> reason except flipping the perf, but this time there is no "branch":"gecko"
> anymore. Are we finishing the experiment so that those who were in the
> control group now start loading pages in Stylo backend again?
Matt, is the Stylo pref-flipping experiment still running? I see the experiment population climbing in [1] but I no longer see the experiment names "pref-flip-quantum-css-style-r1-1381147" or "pref-flip-quantum-css-stylo-beta-1381147" in crash reports' Telemetry Environments or in my own about:telemetry, even though I see other experiment names such as "pref-flip-activity-stream-58-nightly-about-home-bug-1406513" and "e10sCohort".
[1] https://sql.telemetry.mozilla.org/queries/6154#table
Comment 21•7 years ago
|
||
Xidorn, if we add a domain to the Stylo blocklist #ifdef NIGHTLY_BUILD in bug 1407098, we could watch if the CheckCrossStyleBackendAdoption crash volume goes up or not. That might help us know whether this bug is related to pref flips (though, in the new case, not by the experiment).
This crash signature is content process top crash #8 in Beta 57 and #16 in Nightly 58. This is pretty high, so we should try to patch this crash somehow.
There have been 1365 reports since this crash started about a month ago and 100% of them are on Windows. Is there something about CheckCrossStyleBackendAdoption() that would make it only happen on Windows?
As for the legacy add-ons theory, I thought the "extensions.legacy.enabled" pref only worked in the Nightly channel? We see crash reports from both Nightly 58 and Beta 57.
Comment 22•7 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #21)
> Xidorn, if we add a domain to the Stylo blocklist #ifdef NIGHTLY_BUILD in
> bug 1407098, we could watch if the CheckCrossStyleBackendAdoption crash
> volume goes up or not. That might help us know whether this bug is related
> to pref flips (though, in the new case, not by the experiment).
That doesn't help. This is expected to happen only when some pages have been opened, and then the pref gets flipped, and the pages load some new subdocuments in iframe and tries to move element between them. I cannot think of any way it can happen if the perf is not changed after page open.
> This crash signature is content process top crash #8 in Beta 57 and #16 in
> Nightly 58. This is pretty high, so we should try to patch this crash
> somehow.
This is a gatekeeping crash to avoid further security issue from this kind of element adoption. I raised this concern in stylo meeting several weeks (?) ago, and at that time Bobby thought we shouldn't try to wallpaper it because that would be some code which is poorly tested. And I think at that time we concluded that it was low volume so wasn't a big concern. It doesn't seem to me anything has changed since then.
> There have been 1365 reports since this crash started about a month ago and
> 100% of them are on Windows. Is there something about
> CheckCrossStyleBackendAdoption() that would make it only happen on Windows?
I don't think it is something Windows-specific. Maybe we just don't have enough experiment population on other platforms?
Comment 23•7 years ago
|
||
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #22)
> > There have been 1365 reports since this crash started about a month ago and
> > 100% of them are on Windows. Is there something about
> > CheckCrossStyleBackendAdoption() that would make it only happen on Windows?
>
> I don't think it is something Windows-specific. Maybe we just don't have
> enough experiment population on other platforms?
We have reports on other platforms, e.g <bp-0e758f14-4b66-424c-b4d8-fe4890171015>. I have no idea why the symbol, CheckCrossStyleBackendAdoption, is not shown there.
Comment 24•7 years ago
|
||
(In reply to Hiroyuki Ikezoe (:hiro) from comment #23)
> (In reply to Xidorn Quan [:xidorn] UTC+10 from comment #22)
> > > There have been 1365 reports since this crash started about a month ago and
> > > 100% of them are on Windows. Is there something about
> > > CheckCrossStyleBackendAdoption() that would make it only happen on Windows?
> >
> > I don't think it is something Windows-specific. Maybe we just don't have
> > enough experiment population on other platforms?
>
> We have reports on other platforms, e.g
> <bp-0e758f14-4b66-424c-b4d8-fe4890171015>. I have no idea why the symbol,
> CheckCrossStyleBackendAdoption, is not shown there.
That may be because of inlining, then, since AdoptNode is the only callsite of that function.
Comment 25•7 years ago
|
||
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #24)
> (In reply to Hiroyuki Ikezoe (:hiro) from comment #23)
> > (In reply to Xidorn Quan [:xidorn] UTC+10 from comment #22)
> > > > There have been 1365 reports since this crash started about a month ago and
> > > > 100% of them are on Windows. Is there something about
> > > > CheckCrossStyleBackendAdoption() that would make it only happen on Windows?
> > >
> > > I don't think it is something Windows-specific. Maybe we just don't have
> > > enough experiment population on other platforms?
> >
> > We have reports on other platforms, e.g
> > <bp-0e758f14-4b66-424c-b4d8-fe4890171015>. I have no idea why the symbol,
> > CheckCrossStyleBackendAdoption, is not shown there.
>
> That may be because of inlining, then, since AdoptNode is the only callsite
> of that function.
Oh I see. It's cumbersome.
![]() |
||
Comment 26•7 years ago
|
||
This is the #5 Windows topcrash in Nightly 20171013100112.
Comment 27•7 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #26)
> This is the #5 Windows topcrash in Nightly 20171013100112.
It doesn't seem to me this should be bound to any specific version. It seems to be more time-related, that lots of crashes happen in some particular days.
This is the graph for nsIDocument::AdoptNode which is where the same crash happens before:
https://crash-stats.mozilla.com/signature/?signature=nsIDocument%3A%3AAdoptNode&date=%3E%3D2017-09-16T00%3A19%3A15.000Z&date=%3C2017-10-16T00%3A19%3A15.000Z#graphs
Bug 1403024 adds some diagnose information and changes the signature on Windows (other platforms seem to still be using nsIDocument::AdoptNode). And this time the crash graph looks like:
https://crash-stats.mozilla.com/signature/?signature=CheckCrossStyleBackendAdoption&date=%3E%3D2017-10-09T00%3A13%3A21.000Z&date=%3C2017-10-16T00%3A13%3A21.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1#graphs
The pick number seems to be tripled this time, but like before the number drops very quick, so I suspect this is again because of some perf flipping study.
Another indirect evidence is that majority of the crashes happen in distinct instances, and those happen in the same instance are likely either the same crash gets reported multiple times, or multiple content processes crash at the same time (they have almost identical Install Age in crash reports).
Comment 28•7 years ago
|
||
> The pick number seems to be tripled this time, but like before the number drops very quick, so I suspect this is again because of some perf flipping study.
I mean, the peak number.
Comment 29•7 years ago
|
||
Ilana on the Strategy & Insights experiments team says that they accidentally turned off the Stylo experiment last Friday (October 13). That would reset the Gecko users (20% of Nightly 58 and 5% of Beta 57) back to Stylo, the default pref value. That might explain some of the CheckCrossStyleBackendAdoption crashes, except the most recent spike *before* Friday.
Ilana is going to re-enable the Stylo experiments today, so we will probably see more CheckCrossStyleBackendAdoption crashes as all those Nightly and Beta users rejoin the experiment as Gecko users.
Comment 30•7 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #29)
> That might explain some of the CheckCrossStyleBackendAdoption crashes, except the most recent spike
> *before* Friday.
I'm not seeing any spikes before October 13. Am I looking at the wrong graphs?
https://crash-stats.mozilla.com/signature/?signature=CheckCrossStyleBackendAdoption&date=%3E%3D2017-09-18T05%3A09%3A52.000Z&date=%3C2017-10-18T05%3A09%3A52.000Z#graphs
Comment 31•7 years ago
|
||
status-firefox57=wontfix unless someone thinks this bug should block 57
Updated•7 years ago
|
Comment 32•7 years ago
|
||
This signature is either infrequent or gone in 59 release.
Comment 33•7 years ago
|
||
CheckCrossStyleBackendAdoption no longer exists
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
Updated•7 years ago
|
Resolution: INCOMPLETE → WORKSFORME
Updated•7 years ago
|
Flags: needinfo?(mgrimes)
You need to log in
before you can comment on or make changes to this bug.
Description
•