Closed Bug 1303113 Opened 8 years ago Closed 8 years ago

[meta] Turn e10s-multi on in Nightly

Categories

(Core :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox54 --- fixed

People

(Reporter: mrbkap, Assigned: gkrizsanits)

References

Details

(Whiteboard: [e10s-multi:M1])

Attachments

(1 file)

We would like to turn e10s-multi on in Nightly (we'll start with 2 content processes and increase from there).

One question Erin asked over email is if we want to segment the Nightly population. It would probably make sense to do so. We should also probably figure out what percentage of Nightly users are already using processCount > 1.
Depends on: 1312022
This patch as we discussed turns it on for the full population. Since the goal is to get more bug reports and the purpose of nightly is to test features like this I would not split the population. We can do that later on aurora or beta when we want to get the performance numbers / crash stats before release.
Attachment #8808105 - Flags: review?(mrbkap)
Ryan, could you please take a look at the current state: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3b2458ebae8d&selectedJob=30546722

Some of the intermittents we had might have got a bit worse, it's hard to tell. I've re-triggered some tests. Could you help me to decide if we're good to go or if we should disable some of them and re-enable them later?
Flags: needinfo?(ryanvm)
Oh, and ignore test_browserElement_oop_PrivateBrowsing.html failures, it should have been disabled already...
Bc4 on linux debug and bc7 on linux 64 debug seem a bit too orange, but the failing intermittents typically timeouts from known intermittents, not sure what to do about them.
We discussed on this IRC for a bit. The linux32 browser_hsts-priming failures are extremely frequent on m-c tip as well (and probably heading for disabling anyway). There are some leaks that are still concerning, and it's not clear what the situation on linux64 debug will be once browser_tab_dragdrop2.js and browser_tabkeynavigation.js get sorted out. Anyway, I think we're getting close, but it would be good to take another look after some of those bigger issues get cleaned up.
Flags: needinfo?(ryanvm)
Attachment #8808105 - Flags: review?(mrbkap) → review+
BC7 failures were all over the place with the last push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=38f16c7a63ffba51bfdbcf4ddba5b4153792b4b6&selectedJob=30678695

but I think it was unrelated, because today after a rebased patch the issue is gone:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6999b4394c7fe32f78b0e78695034023f7651a7e
Pushed by gkrizsanits@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/adfcc194af1c
Turning off a test for e10s-multi temporarily. r=me
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/e1f230913306b5ab63a64438c12f76e8fbde8c62 for landing just 5 days too early.

Originally, 52 was going to merge to aurora on the 7th and now would have been fine, but it got pushed back a week so now this landed and massively destabilized our tests and shut off quite a few of them for both single and multi-e10s right before next Monday's merge, while people are trying to shove in things at the last minute that they actually want to ship in that branch which is going to be an ESR, so we really *really* don't want to disable tests that we will never remember to reenable on aurora where we'll be back to single-e10s.

Happy to see this land again next Monday morning (Pacific time Monday morning, you can tell it's safe to land when you see a mozilla-central push talking about tagging and bumping the version number), even happier if by then it doesn't have some of the bustage that is already apparent from it, like the leak in https://treeherder.mozilla.org/logviewer.html#?job_id=38930104&repo=mozilla-inbound (one of the clearest examples of why I don't want to see it landed right now, because that failure is bustage which landed a few pushes before you, and then your leak hiding below it), the failure in https://treeherder.mozilla.org/logviewer.html#?job_id=38930048&repo=mozilla-inbound, the leaks for sure and probably the timeout itself in https://treeherder.mozilla.org/logviewer.html#?job_id=38931136&repo=mozilla-inbound, and the failures in https://treeherder.mozilla.org/logviewer.html#?job_id=38938876&repo=mozilla-inbound
browser_tab_dragdrop2.js  was fixed by turning browser_tab_dragdrop.js off (it was enabled and caused browser_tab_dragdrop2.js  to fail very frequently)

the hsts tests are really frustrating, we disabled all of them for linux32-debug this week, and now linux64-debug is failing very frequently as of yesterday (possibly related to backing this out?)
as a note this had a large impact on talos:
== Change summary for alert #4055 (as of November 09 2016 13:46 UTC) ==

Regressions:

270%  tabpaint summary osx-10-10 opt e10s       58.11 -> 214.79
263%  tabpaint summary windowsxp pgo e10s       50.45 -> 183.01
223%  tabpaint summary windows7-32 pgo e10s     49.69 -> 160.26
217%  tabpaint summary linux64 pgo e10s         59.5 -> 188.54
214%  tabpaint summary linux64 opt e10s         68.86 -> 216.27
205%  tabpaint summary windows8-64 opt e10s     59.03 -> 180.29
196%  tabpaint summary windowsxp opt e10s       73.17 -> 216.68
177%  tabpaint summary windows7-32 opt e10s     71.44 -> 197.61
 67%  tps summary windows7-32 pgo e10s          33.25 -> 55.64
 58%  tps summary windows7-32 opt e10s          40.02 -> 63.24
 42%  tps summary osx-10-10 opt e10s            39.78 -> 56.29
 27%  tps summary windowsxp opt e10s            39.5 -> 50.02
 25%  tps summary windowsxp pgo e10s            33.74 -> 42.19
 16%  damp summary windows7-32 pgo e10s         222.94 -> 259.04
 14%  damp summary windows8-64 opt e10s         267.11 -> 303.89
 13%  damp summary osx-10-10 opt e10s           302.7 -> 342.71
 11%  damp summary linux64 opt e10s             293.99 -> 326.18
 10%  damp summary linux64 pgo e10s             245.15 -> 270.28
  9%  tps summary linux64 opt e10s              41.7 -> 45.47
  8%  damp summary windows7-32 opt e10s         300.24 -> 325.07
  8%  tps summary linux64 pgo e10s              36.23 -> 39.22

Improvements:

  3%  tps summary windows8-64 opt e10s     37.1 -> 35.86

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=4055


please address these talos issues prior to landing this again.
(In reply to Joel Maher ( :jmaher) from comment #11)
> please address these talos issues prior to landing this again.

Let's branch this off: bug 1317312.
Depends on: 1317312
No longer depends on: 1251963, 1290167, 1294389
Depends on: 1324428
https://treeherder.mozilla.org/#/jobs?repo=try&revision=55581ed910f5a53008bcc5e0bfe35a2cd1e4a51a&selectedJob=65004753

I'm going to do another try now that the talos regressions are fixed. I had to fix browser_service_workers_status.js and force single cp in one more test, otherwise this patch
is pretty much what I tried the last time. Then I will work on the new intermittent failures
a bit if it's necessary and re-enable the tests.
Depends on: 1328358
Depends on: 1328359
Depends on: 1328360
Depends on: 1328362
Depends on: 1328366
Depends on: 1328368
Depends on: 1328371
Depends on: 1328372
Depends on: 1328374
Depends on: 1328376
Depends on: 1328377
Depends on: 1328379
Depends on: 1328380
Depends on: 1328381
Depends on: 1328382
Depends on: 1328384
Depends on: 1328387
Depends on: 1328389
Depends on: 1328390
Depends on: 1328392
Depends on: 1328395
Depends on: 1328396
Depends on: 1328426
Depends on: 1328427
Depends on: 1328428
as a note, this caused a performance regression in the damp (devtools) test:
== Change summary for alert #4699 (as of January 03 2017 17:16 UTC) ==

Regressions:

 12%  damp summary linux64 opt e10s     329.61 -> 369.16

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=4699

as this is backed out, we are not filing a new bug.
Depends on: 1330018
(In reply to Joel Maher ( :jmaher) from comment #16)
> as a note, this caused a performance regression in the damp (devtools) test:
> == Change summary for alert #4699 (as of January 03 2017 17:16 UTC) ==
> 
> Regressions:
> 
>  12%  damp summary linux64 opt e10s     329.61 -> 369.16
> 
> For up to date results, see:
> https://treeherder.mozilla.org/perf.html#/alerts?id=4699

I think we will have to swallow this. But I will take a look at it if I can do any improvement other then
making sure that devtools do not start a new process (which might be what we will want in the end).
Flags: needinfo?(gkrizsanits)
Hey Kris, do you know anything about these oop extension tests and why they fail with multiple content processes? Also, do you mind if I turn them off temporarily and once they fixed we can turn them back on again? This is blocking us.
Flags: needinfo?(gkrizsanits) → needinfo?(kmaglione+bmo)
(In reply to Wes Kocher (:KWierso) from comment #18)
> I backed this out for failures like
> https://treeherder.mozilla.org/logviewer.html#?job_id=70408216&repo=mozilla-
> inbound
> 
> https://hg.mozilla.org/integration/mozilla-inbound/rev/bde3fc40b9b5

I have not seen this on ash (might have overlooked it), is this a frequent/perma orange or a seen once intermittent?
Flags: needinfo?(wkocher)
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #20)
> (In reply to Wes Kocher (:KWierso) from comment #18)
> > I backed this out for failures like
> > https://treeherder.mozilla.org/logviewer.html#?job_id=70408216&repo=mozilla-
> > inbound
> > 
> > https://hg.mozilla.org/integration/mozilla-inbound/rev/bde3fc40b9b5
> 
> I have not seen this on ash (might have overlooked it), is this a
> frequent/perma orange or a seen once intermittent?

It's essentially permafailing, at least on Windows debug: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&fromchange=e2f6478f748157bf82a5fd0e940a6043af076a77&bugfiler&group_state=expanded&noautoclassify&filter-searchStr=windows%20mochi%20s(5%20debug
Flags: needinfo?(wkocher)
The test_ext_cookies failure looks like a manifestation of bug 1309637. I'd be OK with disabling that test in Windows debug builds until that bug is fixed, but I'm a bit worried that e10s multi triggering that bug.

In the case of test_ext_storage_content, it looks like we're somehow running the test twice, in parallel, and the two instances are conflicting. I can't produce this locally on Linux, but it's definitely worrisome, and we can't just disable this test. I'll look into it some more.

I suspect the test_ext_i18n and test_ext_unload_frame problems may be similar, but I can't reproduce them locally either.
Flags: needinfo?(kmaglione+bmo)
OK, it actually looks like this is a cascade of failures caused by the test_ext_cookies timeout, that leads to extra windows and tabs staying open, and breaking the other tests. So let's just disable that test on Windows debug builds.
(In reply to Pulsebot from comment #24)
> Pushed by gkrizsanits@mozilla.com:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/0c891a3aff93
> Turn e10s-multi on in Nightly. r=me

Just to make this clear, this is the one that has got backed out, just Pulsebot was slower than the backout.

(In reply to Kris Maglione [:kmag] from comment #25)
> OK, it actually looks like this is a cascade of failures caused by the
> test_ext_cookies timeout, that leads to extra windows and tabs staying open,
> and breaking the other tests. So let's just disable that test on Windows
> debug builds.

Thanks Kris, I'll do that. Although it seems like this failure might happen on linux as well (as Wes pointed it out), so I might have to turn it off for all debug builds temporarily.
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #26)
> Thanks Kris, I'll do that. Although it seems like this failure might happen
> on linux as well (as Wes pointed it out), so I might have to turn it off for
> all debug builds temporarily.

Actually I see failures in release mode as well: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&selectedJob=70726841

Not too frequent yet... might have to do a followup and turn it off in release mode as well. It would be nice if there were a way to turn off the oop version only...
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #28)
> Not too frequent yet... might have to do a followup and turn it off in
> release mode as well. It would be nice if there were a way to turn off the
> oop version only...

There is, but the failures aren't only in OOP mode. Those just happen to show up first.
OK, we can't disable that entire test, but we can disable the private browsing section that's failing until bug 1309637 is fixed:

http://searchfox.org/mozilla-central/rev/30fcf167af036aeddf322de44a2fadd370acfd2f/toolkit/components/extensions/test/mochitest/test_ext_cookies.html#182-215
(In reply to Kris Maglione [:kmag] from comment #32)
> OK, we can't disable that entire test, but we can disable the private
> browsing section that's failing until bug 1309637 is fixed:
> 
> http://searchfox.org/mozilla-central/rev/
> 30fcf167af036aeddf322de44a2fadd370acfd2f/toolkit/components/extensions/test/
> mochitest/test_ext_cookies.html#182-215

By disabling you mean should I just remove that part of the test and it will be put back once bug 1309637 is fixed?
Flags: needinfo?(gkrizsanits)
I'd rather add an `if (false)` to the beginning of that block than remove it, and add a comment about bug 1309637, but yes.
Depends on: 1332809
Pushed by gkrizsanits@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f6c9241b40ec
Followup for a typo in the manigest file. r=me
(In reply to Kris Maglione [:kmag] from comment #34)
> I'd rather add an `if (false)` to the beginning of that block than remove
> it, and add a comment about bug 1309637, but yes.

This is not going to work, this test is failing in quite a few ways. I must disable the entire test I'm afraid. Or if you think that this is not just a bug in the test but actually a broken feature important enough to hold back the landing, please let me know so I can plan accordingly.
Flags: needinfo?(kmaglione+bmo)
Only the private browsing parts are failing. There are other parts of the test file that deal with private browsing that can be disabled as well, but we can't disable the entire test.
Flags: needinfo?(kmaglione+bmo)
(In reply to Kris Maglione [:kmag] from comment #41)
> Only the private browsing parts are failing. There are other parts of the
> test file that deal with private browsing that can be disabled as well, but
> we can't disable the entire test.

Even this one? https://treeherder.mozilla.org/logviewer.html#?job_id=70963350&repo=mozilla-inbound&lineNumber=2967

I see two blocks where we open a privateWindow, I disabled them both, this failures happens before those. Anyway, let's talk about it on Monday, and we'll sort out something for this.
No longer depends on: 1332809
Hm. No, I guess not. I was assuming that was related to the private browsing cookies, but from the screenshot, it looks like we're actually just winding up with an extra tab open during that test.

And it looks like that's probably a race in test_ext_contentscript_permission.html, which doesn't wait for its tab removal to succeed before ending the test.
Depends on: 1332868
Please nominate this for the release notes when it is ready to ride the trains.
Assignee: nobody → gkrizsanits
I duped all the leak bugs over, as they haven't shown up in a while, except for bug 1328374, which has happened a few times, in around the time window when this landed.
https://hg.mozilla.org/mozilla-central/rev/aefa445b9c77
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
No longer depends on: 1289723
No longer depends on: 1324428
No longer depends on: 1328362
Depends on: 1275447
Depends on: 1254841
Depends on: 1337778
Depends on: 1341353
Depends on: 1402905
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: