Closed Bug 1290089 Opened 4 years ago Closed 4 years ago

Intermittent Assertion failure: !sPrefBrowserTabsRemoteAutostart || IsVistaOrLater(), at c:/builds/moz2_slave/m-beta-w32-d-00000000000000000/build/src/gfx/thebes/gfxPlatform.cpp:2091

Categories

(Core :: Graphics: Layers, defect)

defect
Not set

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox48 - affected

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [gfx-noted])

milan, maybe you are interested in this.
Flags: needinfo?(milan)
Looks like that assertion was removed, I don't see it on central.
Milan might be on pto, Matt, can you help here?
Flags: needinfo?(matt.woodrow)
Blocks: 1282584
FYI, this showed up on beat when we uplifted bug 1282584 there. The assertion was removed at some point on central and I'm guessing aurora as well. We just need to know if this is serious or if we can ignore it until the assert gets merged away.
Tracking until we know more.
We are supposed to be disabling e10s on XP, that's what the assertion is trying to catch, see bug 1237769.

Bug 1282584 changed the way we check if e10s is enabled, so if this is failing it means we must have e10s enabled which could be bad.

George might know more.
Flags: needinfo?(matt.woodrow) → needinfo?(gwright)
It looks like the assert was removed (inadvertently?) by dvander in bug 1270650.

The issue here is that we make an assumption that e10s is on if we're a content process. The rationale in bug 1282584 was that we didn't think there was any reason for BrowserTabsRemoteAutostart() to be called in thumbnail processes. Turns out gfxPlatform's initialisation calls it, and this assert checks it.

This is probably a real issue, as all our gfx code will be initialised for e10s when in a thumbnail process.
Flags: needinfo?(gwright)
George, this bug is in RC2 that we are planning to ship in Fx 48 next Tuesday.
So, I need your help with understanding the impact, can we ship with this bug? If not, do you have an eta for a patch so that we can build a RC3.

Johny, Milan is in PTO, so, I would appreciate your input if you have any :)
Flags: needinfo?(jst)
Flags: needinfo?(gwright)
(btw, when this is fixed and over, can we have a test for this (as long as we support XP). Because the assertion failure was randomly in other failed tests - and so would have not catched if all tests would have passed).
Flags: in-testsuite?
Flags: in-qa-testsuite?
Andrei, can we have a quick check that 48 rc2 is still working on XP? Thanks
Maybe bug 1282584 caused new regressions.
sorry for the last minute request...
Flags: needinfo?(andrei.vaida)
(In reply to Sylvestre Ledru [:sylvestre] from comment #10)
> Andrei, can we have a quick check that 48 rc2 is still working on XP?
XP SP3 x86, everything's apparently fine: google search, youtube, flash games, wikipedia
Flags: needinfo?(andrei.vaida)
good news, thanks
As the impact to users seems limited, I won't track it anymore
Here's what we need to do, I think:

1) Revert the patch from bug 1282584
2) Fix the underlying issue for bug 1288751

bug 1288751 looks like it's caused by a11y prefs. Basically, if we load with e10s on, then it detects if a11y is enabled then sets a pref to say "a11y was enabled, disable e10s for future runs". This happens in the parent process. When a content process is spawned, it goes through that same code and checks the pref, sees that a11y was enabled then disables e10s. This causes a mismatch where the content process thinks e10s is not on, and the parent process thinks it is on.

I'll have a think about the best way to fix this. Hopefully I can come up with a solution today, but note that this weekend is a long weekend in Canada.

Jim: do you have any suggestions?
Flags: needinfo?(gwright) → needinfo?(jmathies)
(In reply to George Wright (:gw280) (:gwright) from comment #7)
> This is probably a real issue, as all our gfx code will be initialised for
> e10s when in a thumbnail process.

This is in the content process using a browser that's generating thumbnails. I'm curious if this is in fact an issue. Note this is a shared content process (content and thumbnailing) and we've already determined that we want to keep e10s running in this process until a restart to keep apz operating.

I was tracing around in the beta repo code for areas sPrefBrowserTabsRemoteAutostart influences. I don't see it playing much of a role in the content process. It definitely gets checked from chrome in various places. I'd love to track down why it broke apz which we might be able to one-off fix. But I'm not  convinced out content process is in an unstable state here because of it.

(In reply to George Wright (:gw280) (:gwright) from comment #15)
> Here's what we need to do, I think:
> 
> 1) Revert the patch from bug 1282584
> 2) Fix the underlying issue for bug 1288751
> 
> bug 1288751 looks like it's caused by a11y prefs. Basically, if we load with
> e10s on, then it detects if a11y is enabled then sets a pref to say "a11y
> was enabled, disable e10s for future runs". This happens in the parent
> process. When a content process is spawned, it goes through that same code
> and checks the pref, sees that a11y was enabled then disables e10s. This
> causes a mismatch where the content process thinks e10s is not on, and the
> parent process thinks it is on.
> 
> I'll have a think about the best way to fix this. Hopefully I can come up
> with a solution today, but note that this weekend is a long weekend in
> Canada.
> 
> Jim: do you have any suggestions?

We have a fix for the a11y bug which we could uplift, but it would mean that new installs with touchscreens would not be able to scroll, and would get a prompt asking them to restart. I think it would be better to try and keep the behavior we have now and look for any issues we need to address as fallout from bug 1282584.
Flags: needinfo?(jmathies)
Flags: needinfo?(jst)
(In reply to Jim Mathies [:jimm] from comment #16)
> This is in the content process using a browser that's generating thumbnails.
> I'm curious if this is in fact an issue. Note this is a shared content
> process (content and thumbnailing) and we've already determined that we want
> to keep e10s running in this process until a restart to keep apz operating.

In thinking a bit more about this - from my understanding, Milan's concern in bug 1282584 was that we could end up with the following situation:

1) user launches the browser with e10s enabled (content process launches)
2) a11y triggers a change of prefs that would result in BrowserTabsRemoteAutostart() returning false *except* that since we cache the result in nsAppRunner, it remains true for the run in chrome.
3) suppose the user closes all all content tabs, this shuts down the content process.
3.1) thumbnailing service impacts this decision. if it's running, the process remains alive. if it's not running, the child proces closes.
4) a new content tab opens, content process launches

Prior to Milan's change: the new content process checks BrowserTabsRemoteAutostart() and gets a return value of false, e10s features are disabled, scrolling breaks.

With Milan's change: the new content process checks BrowserTabsRemoteAutostart() and gets a return value of *true*, e10s features are enabled, scrolling works. The thumbnailing service is also free to operate in this process, and I think it will operate correctly since we are in a state where e10s is still technically enabled.

What impact does Milan's change have on background thumbnailing? I would say none since the browser is already in a state where it expect e10s to be operating.

In retrospect the right fix for bug 1282584 probably should have involved passing the result of BrowserTabsRemoteAutostart() from chrome to child rather than have the child check the prefs on its own.
There is one scenario I left out which we need to check on, the impact of Milan's change on background thumbnailing with e10s disabled. I think this is what gwright was getting at. We have tests for this service which are not failing, so my guess is things are ok. We should investigate further to be sure.
(In reply to Jim Mathies [:jimm] from comment #18)
> There is one scenario I left out which we need to check on, the impact of
> Milan's change on background thumbnailing with e10s disabled. I think this
> is what gwright was getting at. We have tests for this service which are not
> failing, so my guess is things are ok. We should investigate further to be
> sure.

Sorry, I should have been more clear. This is exactly what my concern is, and as I don't know what codepaths the thumbnail process exercises, I don't really have any insight on the potential impact here. The tests are a good start, though.
It looks like the thumbnail process is always supposed to be "e10s" anyway:

http://searchfox.org/mozilla-central/source/toolkit/components/thumbnails/BackgroundPageThumbs.jsm#196

I think this is a non-issue then.
Whiteboard: [gfx-noted]
Sounds like we're good then.
Status: NEW → RESOLVED
Closed: 4 years ago
Flags: needinfo?(milan)
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.