Open Bug 1383301 Opened 7 years ago Updated 2 months ago

Reland the third patch from bug 1379762, after figuring out why it causes problems

Categories

(Core :: DOM: Core & HTML, defect, P3)

53 Branch
defect

Tracking

()

People

(Reporter: bzbarsky, Unassigned)

References

Details

Attachments

(1 obsolete file)

It got backed out in bug 1382079.
Depends on: 1379762
Flags: needinfo?(bzbarsky)
OK, so I tried building rev 0faada5c2f30 locally and running "mach mochitest browser/extensions/onboarding/test/browser/browser_onboarding_tourset.js" on Mac.  It consistently fails for me by just timing out.  I get the same thing if I back out locally the the third part of bug 1379762.

I do get the leaked windows on try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c87eef70a5bc63ac0c30d44e67cba3525f830db5

I have no idea why this test totally fails locally...
What timeouts are you running into? Are they the "Should load onboarding overlay - timed out after 50 tries" of bug 1383070? If so, you can revert https://hg.mozilla.org/mozilla-central/rev/f4378a5e6c7f from the latest mozilla-central to get rid of those timeouts. I just pushed to try with "Remove Performance::GetAsISupports()" backed out and "Bug 1379762 part 3" reapplied: https://treeherder.mozilla.org/#/jobs?repo=try&revision=eb3cca70d05e14ac661fd820bb57d0001249ff05

My local mac debug artifact build seems to be able to run the test… ? I'm using git/cinnabar at the equivalent 0faada5c2f30  commit:

commit c3af56a75b35189ad5ae1a794bafbd208d6dbe18 (HEAD)
Merge: 9abe987ff651 ae7d66928b1d
Author: Sebastian Hengst <archaeopteryx@coole-files.de>
Date:   Fri Jul 21 11:12:24 2017 +0200
    merge autoland to mozilla-central. r=merge a=merge

# HG changeset patch
# User Sebastian Hengst <archaeopteryx@coole-files.de>
# Date 1500628344 -7200
#      Fri Jul 21 11:12:24 2017 +0200
# Node ID 0faada5c2f308f101ab7c54a87b3dce80b97d0e3
# Parent  7d2e89fb92331d7e4296391213c1e63db628e046
# Parent  056aa9b5ece3b2758406f5855e3e547cdb72a973
merge autoland to mozilla-central. r=merge a=merge

.mozconfig:
export MOZ_DEBUG="1"
ac_add_options --enable-debug --enable-artifact-builds
bz, the try pushes look as expected with reverting "Bug 1159003 - Remove Performance::GetAsISupports(), r=bz".

Bug 1382079 leaked 2 about:newtab appears:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=eb3cca70d05e14ac661fd820bb57d0001249ff05
1b05308832e1 Revert "Bug 1382079 - Backed out changeset 5fc778386eb1 (Bug 1379762 part 3). r=bz"
db3ec356dcd3 Revert "Bug 1159003 - Remove Performance::GetAsISupports(), r=bz"
7ce557b85b61 latest mozilla-central

And as a sanity check that our fix for bug 1382079 by backing out "part 3" did indeed work:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1c62e3b2789cbb9edbe30b1389098410810c5069
ed0ef50796a6 Revert "Bug 1159003 - Remove Performance::GetAsISupports(), r=bz"
7ce557b85b61 latest mozilla-central

So bug 1159003 is hiding the leak fix/undo by causing onboarding tests to time out.

Instead of trying to reapply "part 3" on 0faada5c2f30, you can grab the latest m-c and back out https://hg.mozilla.org/mozilla-central/rev/f4378a5e6c7f
> Are they the "Should load onboarding overlay - timed out after 50 tries" of bug 1383070?

No.  They're just the browser comes up loads the new tab page, then nothing happens, then it dies.  Here's the full INFO output I see locally:

0 INFO *** Start BrowserChrome Test Results ***
1 INFO checking window state
2 INFO TEST-START | browser/extensions/onboarding/test/browser/browser_onboarding_tourset.js
3 INFO Entering test bound test_onboarding_default_new_tourset
4 INFO TEST-UNEXPECTED-FAIL | browser/extensions/onboarding/test/browser/browser_onboarding_tourset.js | Test timed out - 
5 INFO TEST-OK | browser/extensions/onboarding/test/browser/browser_onboarding_tourset.js | took 45080ms
6 INFO TEST-UNEXPECTED-FAIL | browser/extensions/onboarding/test/browser/browser_onboarding_tourset.js | Found a tab after previous test timed out: about:home - 
7 INFO checking window state
8 INFO Console message: [JavaScript Error: "TypeError: aBrowser is null" {file: "resource:///modules/ZoomUI.jsm" line: 62}]
GECKO(63481) | JavaScript error: resource:///modules/ZoomUI.jsm, line 62: TypeError: aBrowser is null
9 INFO Console message: [JavaScript Error: "TypeError: aBrowser is null" {file: "resource:///modules/ZoomUI.jsm" line: 62}]
10 INFO TEST-UNEXPECTED-FAIL | browser/extensions/onboarding/test/browser/browser_onboarding_tourset.js | Uncaught exception received from previously timed out test - Should load onboarding overlay - threw exception: TypeError: ContentTask is null
11 INFO TEST-START | Shutdown
12 INFO Browser Chrome Test Summary
13 INFO Passed:  0
14 INFO Failed:  3
15 INFO Todo:    0
16 INFO Mode:    e10s
17 INFO *** End BrowserChrome Test Results ***

Maybe this test doesn't run propely in non-packaged builds or something?  But I didn't think artifact builds were packaged...

> Instead of trying to reapply "part 3" on 0faada5c2f30, you can grab the latest m-c and back out https://hg.mozilla.org/mozilla-central/rev/f4378a5e6c7f

This will stop working once the backout of part 3 merges to m-c, no?  Or is the point that even without part 3 but with https://hg.mozilla.org/mozilla-central/rev/f4378a5e6c7f backed out the test still leaks?  Pretty confused now....
(In reply to Boris Zbarsky [:bz] from comment #4)
> 10 INFO TEST-UNEXPECTED-FAIL |
> browser/extensions/onboarding/test/browser/browser_onboarding_tourset.js |
> Uncaught exception received from previously timed out test - Should load
> onboarding overlay - threw exception: TypeError: ContentTask is null
Well that's suspiciously odd. It uses a ContentTask to determine if it should continue the test. Fischer, any ideas?

I checked out 0faada5c2f30 and made my .mozconfig only contain `ac_add_options --enable-debug`:
0 INFO *** Start BrowserChrome Test Results ***
1 INFO checking window state
2 INFO TEST-START | browser/extensions/onboarding/test/browser/browser_onboarding_tourset.js
3 INFO Entering test bound test_onboarding_default_new_tourset
4 INFO TEST-PASS | browser/extensions/onboarding/test/browser/browser_onboarding_tourset.js | has exact tour numbers - 
5 INFO TEST-PASS | browser/extensions/onboarding/test/browser/browser_onboarding_tourset.js | contain defined onboarding id - 
6 INFO TEST-PASS | browser/extensions/onboarding/test/browser/browser_onboarding_tourset.js | contain defined onboarding id - 
7 INFO TEST-PASS | browser/extensions/onboarding/test/browser/browser_onboarding_tourset.js | contain defined onboarding id - 
8 INFO TEST-PASS | browser/extensions/onboarding/test/browser/browser_onboarding_tourset.js | contain defined onboarding id - 
9 INFO TEST-PASS | browser/extensions/onboarding/test/browser/browser_onboarding_tourset.js | contain defined onboarding id - 
10 INFO TEST-PASS | browser/extensions/onboarding/test/browser/browser_onboarding_tourset.js | contain defined onboarding id - 
11 INFO Leaving test bound test_onboarding_default_new_tourset
12 INFO Entering test bound test_onboarding_custom_new_tourset
13 INFO TEST-PASS | browser/extensions/onboarding/test/browser/browser_onboarding_tourset.js | has exact tour numbers - 
14 INFO TEST-PASS | browser/extensions/onboarding/test/browser/browser_onboarding_tourset.js | contain defined onboarding id - 
15 INFO TEST-PASS | browser/extensions/onboarding/test/browser/browser_onboarding_tourset.js | contain defined onboarding id - 
16 INFO TEST-PASS | browser/extensions/onboarding/test/browser/browser_onboarding_tourset.js | contain defined onboarding id - 
17 INFO Leaving test bound test_onboarding_custom_new_tourset
18 INFO Entering test bound test_onboarding_custom_update_tourset
19 INFO TEST-PASS | browser/extensions/onboarding/test/browser/browser_onboarding_tourset.js | has exact tour numbers - 
20 INFO TEST-PASS | browser/extensions/onboarding/test/browser/browser_onboarding_tourset.js | contain defined onboarding id - 
21 INFO TEST-PASS | browser/extensions/onboarding/test/browser/browser_onboarding_tourset.js | contain defined onboarding id - 
22 INFO TEST-PASS | browser/extensions/onboarding/test/browser/browser_onboarding_tourset.js | contain defined onboarding id - 
23 INFO Leaving test bound test_onboarding_custom_update_tourset
24 INFO TEST-OK | browser/extensions/onboarding/test/browser/browser_onboarding_tourset.js | took 5168ms
25 INFO checking window state
26 INFO Console message: [JavaScript Error: "TypeError: aBrowser is null" {file: "resource:///modules/ZoomUI.jsm" line: 62}]
27 INFO Console message: [JavaScript Error: "TypeError: aBrowser is null" {file: "resource:///modules/ZoomUI.jsm" line: 62}]
28 INFO TEST-START | Shutdown
29 INFO Browser Chrome Test Summary
30 INFO Passed:  15
31 INFO Failed:  0
32 INFO Todo:    0
33 INFO Mode:    e10s
34 INFO *** End BrowserChrome Test Results ***


> This will stop working once the backout of part 3 merges to m-c, no?
Sorry it's confusing as there's multiple back outs going on and reverting of back outs. I'll try to summarize.

Current mozilla-central (7ce557b85b61, _with_ bug 1159003, *without* bug 1379762 part 3):
tests time out with bug 1383070 [1]

Current mozilla-central *without* bug 1159003 (*without* bug 1379762 part 3):
tests pass without leaking [2]

Current mozilla-central _with_ bug 1379762 part 3, *without* bug 1159003
tests leak with bug 1382079 [3]

[1] https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=7ce557b85b611536b69539a7c18d4834ffc92eea&filter-searchStr=linux64%20debug%20bc&selectedJob=116635956&filter-resultStatus=testfailed
[2] https://treeherder.mozilla.org/#/jobs?repo=try&revision=1c62e3b2789cbb9edbe30b1389098410810c5069
[3] https://treeherder.mozilla.org/#/jobs?repo=try&revision=eb3cca70d05e14ac661fd820bb57d0001249ff05
Flags: needinfo?(fliu)
(In reply to Ed Lee :Mardak from comment #5)
> (In reply to Boris Zbarsky [:bz] from comment #4)
> > 10 INFO TEST-UNEXPECTED-FAIL |
> > browser/extensions/onboarding/test/browser/browser_onboarding_tourset.js |
> > Uncaught exception received from previously timed out test - Should load
> > onboarding overlay - threw exception: TypeError: ContentTask is null
> Well that's suspiciously odd. It uses a ContentTask to determine if it
> should continue the test. Fischer, any ideas?
> 
The `ContentTask` should not be null. It is one of the global test utility helpers, which got hooked in with others at [1].

Here are my TRYs:
# In [2] and [3]:
  - The updates to the tests are a) try to wait for idles before checking since the onboarding dose many jobs in `requestIdleCallback` and b) only load about:newtab in the browser_onboarding_tourset.js (loaded about:home before, which might not be need).

  - Didn't see the timeout of "Should load onboarding overlay - timed out after 50 tries"

  - Saw fee "This test exceeded the timeout threshold. It should be rewritten or split up....". This could has do to with the bug 1382079 comment 31. Now Linux debug is running quite slow. We have already let only 2~3 test cases in each file. It could be hard to manage one test case in each file. Updating the test cases is kind of not an option because we have to reload page a few times to test notifications. I probably would increase `requestLongerTimeout` even to 3.

  - Didn't see the windows leak
 
  - For other failures, shouldn't have to do with here because already happened before and all the updates only belongs to the onboarding test files.

# In [4]:
  - This commit based on the timeout fix above and do `requestLongerTimeout(3)`.
  - Didn't see the timeout of "Should load onboarding overlay - timed out after 50 tries"
  - Didn't see "This test exceeded the timeout threshold. It should be rewritten or split up...."
  - Interestingly, it brought back bug 1379762 part 3 but didn't see the window leak issue.

I am going to submit a patch to fix the timeout issue first.

[1] https://dxr.mozilla.org/mozilla-central/rev/7d2e89fb92331d7e4296391213c1e63db628e046/testing/mochitest/browser-test.js#688
[2] TRY on the timeout fix not on the latest mc: https://treeherder.mozilla.org/#/jobs?repo=try&revision=32e56e55769b0d3b11d33724da748b6a282f99c5
[3] TRY on the timeout fix on the latest mc: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0800fd36e2b4530f1a3ac8fcddc1c2670b91839d
[4] https://treeherder.mozilla.org/#/jobs?repo=try&revision=4b52f0083e4d4234f0a72cc7ec85eb6abae04a4f
(In reply to Fischer [:Fischer] from comment #6)
> I am going to submit a patch to fix the timeout issue first.
> 
To be clear, the patch would be submitted in the bug 1383070.
> Current mozilla-central _with_ bug 1379762 part 3, *without* bug 1159003
> tests leak with bug 1382079 [3]

Just so we're clear, what I was seeing in comment 1 is that the tests leak "as expected" on try, but just time out locally.  So the problem is not that I can't reproduce the leak on try, it's that I can't debug it!

Anyway, just to make sure I just checked out rev 7ce557b85b61, backed out the part of bug 1159003 that your try push backs out (just the second diff from that bug, not everything from it, right?), applied part 3 of bug 1379762 locally.  Specifically, I did:

1) hg up -C -r 7ce557b85b61
2) hg qimport -n back-out-1159003 https://hg.mozilla.org/try/raw-rev/db3ec356dcd35c2ce4f9fc2d8918a2051c2d01ee
3) hg qpush
4) hg qimport -n restore-my-part-3 https://hg.mozilla.org/try/raw-rev/1b05308832e120543f8a2e6b94fdb6cd83cbb6a2
5) hg qpush
6) build (opt, not debug, fwiw, because running debug mochitests is a huge pain)
7) mach mochitest browser/extensions/onboarding/test/browser/browser_onboarding_tourset.js

The test just times out, in exactly the way I describe in comment 2 and comment 4.

I just checked with a debug build (same steps, but build debug in step 6), and it also times out in exactly the same way...
I'm not really sure what to suggest other than a clean .mozconfig?

Maybe one last sanity check is just running a different mochitest that also uses `ContentTask`:
./mach mochitest browser_aboutHome.js

As Fischer said in comment 6, ContentTask should be a global like Assert and other test helpers.

I followed your steps in comment 8 with no .mozconfig, and the mochitest ran fine. Here's my hg log:

> $  hg log --style compact --limit 3
> 370365[qtip,restore-my-part-3,tip]   ad1af6cf1b29   2017-07-22 08:11 -0700   edilee
>  Revert "Bug 1382079 - Backed out changeset 5fc778386eb1 (Bug 1379762 part 3). r=bz"
> 
> 370364[back-out-1159003,qbase]   5a1cdd4e607d   2017-07-22 08:10 -0700   edilee
>  Revert "Bug 1159003 - Remove Performance::GetAsISupports(), r=bz"
> 
> 370363[qparent]   7ce557b85b61   2017-07-22 13:12 +0200   archaeopteryx
>  Backed out changeset 55070d5d1a59 (bug 1362617) for breaking OS X L10n nightlies. r=backout a=backout
I've had some luck using a [One Click Loaner)(https://ahal.ca/blog/2016/taskcluster-interactive-loaner/) to debug test problems that I haven't been able to reproduce locally.
Ah, I found what the problem is.  It's the stupid sandboxing thing again, and mach not setting things up correctly to run tests...  If I run with MOZ_DISABLE_CONTENT_SANDBOX=t then the tests stop timing out.

For anyone else following along, the patches that finally landed in bug 1380690 should help with that, but in the meantime it was a giant waste of time for all of us.  :(
Flags: needinfo?(bzbarsky)
Flags: needinfo?(fliu) → needinfo?(bzbarsky)
So my suspicion is that the problem here was the leak from bug 1159003 (from the first patch).  The followup patch in that bug could make the leak go away in some cases, which may have been what happened here.  My load blocker changes could change timing, which could also make the leak appear or disappear, depending...

I'm going to make a try push with just my patch on top of m-c tip and if that passes try I will simply reland it...
Priority: -- → P3
Hi Edgar, could you help us move this forward?
Flags: needinfo?(echen)
Flags: needinfo?(echen)
Component: DOM → DOM: Core & HTML
Flags: needinfo?(bzbarsky)
Severity: normal → S3
Attachment #9381280 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: