If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

75% permafail on WinXP test_outerHTML.xhtml,test_picture_mutations.html,test_picture_pref.html,test_pointerPreserves3D.html,test_pointerPreserves3DClip.html,test_resource_timing.html, | application timed out after 330 seconds with no output

RESOLVED FIXED in Firefox 49

Status

()

Core
DOM
RESOLVED FIXED
a year ago
a year ago

People

(Reporter: aryx, Assigned: RyanVM)

Tracking

(Blocks: 1 bug, {intermittent-failure})

49 Branch
mozilla50
intermittent-failure
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox48 unaffected, firefox49 fixed, firefox50 fixed)

Details

https://treeherder.mozilla.org/logviewer.html#?job_id=28321147&repo=mozilla-inbound
Summary: Intermittent test_picture_pref.html | application timed out after 330 seconds with no output → Intermittent test_picture_pref.html or test_pointerPreserves3D.html or test_resource_timing.html | application timed out after 330 seconds with no output, nearly-permaorange on Windows XP pgo
Duplicate of this bug: 1274742
Seems like there's a decent chance this is somehow related to bug 1274450.
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&group_state=expanded&fromchange=47ced506c7a32e8ca807978071a536f64c8332c8&filter-tier=1&filter-searchStr=Windows%20XP%20pgo%20M(3)
(In reply to David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) from comment #2)
> Seems like there's a decent chance this is somehow related to bug 1274450.

Although when both were present, it seems like it was unrelated which happened on which push, although they did both happen on the same push once.
Doing bisection in:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&group_state=expanded&fromchange=159d2986681071d6549031afaff91911f8080a27&tochange=44fa05b72b6fb2ef0edfe73e4767154ab99381bc&filter-tier=1&filter-searchStr=Windows%20XP%20pgo%20M(3)
I realize this might just be a new form of bug 1273758.
Nope, the bisection (see link in comment 5) indicates that the regression was in this window:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=9b44a0d216be66c4ac1a05d344a838a5056692e9&tochange=00218374a90cfbb6b66a9a1bf8e5483efcb18661
which I'm pretty sure means one of the changesets from bug 1273070.
Blocks: 1273070
Flags: needinfo?(bkelly)
Summary: Intermittent test_picture_pref.html or test_pointerPreserves3D.html or test_resource_timing.html | application timed out after 330 seconds with no output, nearly-permaorange on Windows XP pgo → 75% permafail test_picture_pref.html or test_pointerPreserves3D.html or test_resource_timing.html | application timed out after 330 seconds with no output, nearly-permaorange on Windows XP pgo
Summary: 75% permafail test_picture_pref.html or test_pointerPreserves3D.html or test_resource_timing.html | application timed out after 330 seconds with no output, nearly-permaorange on Windows XP pgo → 75% permafail on WinXP PGO test_picture_pref.html or test_pointerPreserves3D.html or test_resource_timing.html | application timed out after 330 seconds with no output
Duplicate of this bug: 1274702
Summary: 75% permafail on WinXP PGO test_picture_pref.html or test_pointerPreserves3D.html or test_resource_timing.html | application timed out after 330 seconds with no output → 75% permafail on WinXP PGO test_picture_pref.html,test_pointerPreserves3D.html,test_pointerPreserves3DClip.html,test_resource_timing.html | application timed out after 330 seconds with no output
Feel free to back out bug 1273070 to be safe, but I don't think that these can be related:

1) All the test code I added in dom/tests/mochitest/fetch runs in a separate browser instance from dom/tests/mochitest/general.
2) None of the tests in dom/tests/mochitest/general execute any fetch code.  I added asserts and ran the tests locally to verify this.
Flags: needinfo?(bkelly)
Did mozilla-build change around this time?

Looking at the build for the previous commit:

  http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-inbound-win32-pgo/1463866923/mozilla-inbound-win32-pgo-bm91-build1-build177.txt.gz

I see this output in the log:

14:42:59     INFO -  Executing: ['c:\\mozilla-build\\python27\\python.exe', 'C:/mozilla-build/tooltool.py', '--authentication-file', 'c:\\builds\\relengapi.tok', '-c', 'c:/builds/tooltool_cache', '--url', 'https://api.pub.build.mozilla.org/tooltool/', '--overwrite', '-m', 'c:\\builds\\moz2_slave\\m-in-w32-pgo-00000000000000000\\build\\src\\browser/config/tooltool-manifests/win32/releng.manifest', 'fetch']
14:43:03     INFO -  INFO - rm tree: rustc
14:43:04     INFO -  INFO - untarring "rustc.tar.bz2"
14:43:12     INFO -  INFO - rm tree: sccache
14:43:13     INFO -  INFO - untarring "sccache.tar.bz2"
14:43:13     INFO -  INFO - rm tree: vs2015u2
14:43:20     INFO -  INFO - unzipping "vs2015u2.zip"
14:43:35     INFO - Return code: 0

On my commit where the failures started I see:

11:40:10     INFO -  Executing: ['c:\\mozilla-build\\python27\\python.exe', 'C:/mozilla-build/tooltool.py', '--authentication-file', 'c:\\builds\\relengapi.tok', '-c', 'c:/builds/tooltool_cache', '--url', 'https://api.pub.build.mozilla.org/tooltool/', '--overwrite', '-m', 'c:\\builds\\moz2_slave\\m-in-w32-pgo-00000000000000000\\build\\src\\browser/config/tooltool-manifests/win32/releng.manifest', 'fetch']
11:40:10     INFO -  INFO - File mozmake.exe retrieved from local cache c:/builds/tooltool_cache
11:40:10     INFO -  INFO - File rustc.tar.bz2 not present in local cache folder c:/builds/tooltool_cache
11:40:10     INFO -  INFO - Attempting to fetch from 'https://api.pub.build.mozilla.org/tooltool/'...
11:40:15     INFO -  INFO - File rustc.tar.bz2 fetched from https://api.pub.build.mozilla.org/tooltool/ as c:\builds\moz2_slave\m-in-w32-pgo-00000000000000000\build\src\tmpxs8cbr
11:40:19     INFO -  INFO - File sccache.tar.bz2 retrieved from local cache c:/builds/tooltool_cache
11:40:36     INFO -  INFO - File vs2015u2.zip retrieved from local cache c:/builds/tooltool_cache
11:40:41     INFO -  INFO - File integrity verified, renaming tmpxs8cbr to rustc.tar.bz2
11:40:41     INFO -  INFO - Updating local cache c:/builds/tooltool_cache...
11:40:41     INFO -  INFO - Local cache c:/builds/tooltool_cache updated with rustc.tar.bz2
11:40:41     INFO -  INFO - untarring "sccache.tar.bz2"
11:40:55     INFO -  INFO - unzipping "vs2015u2.zip"
11:43:22     INFO -  INFO - untarring "rustc.tar.bz2"
11:43:53     INFO - Return code: 0

I'm not saying this output is exactly the cause, but might suggest something else out-of-band changed here.

Ryan, do you know what is going on with mozilla-build here?
Flags: needinfo?(ryanvm)
9 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 8
* mozilla-central: 1

Platform breakdown:
* windowsxp: 9

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1274741&startday=2016-05-16&endday=2016-05-22&tree=all
(Assignee)

Comment 12

a year ago
The MozillaBuild package I maintain has very little to do with what we do in CI at the moment. Not sure what might have changed in the RelEng world last week, maybe catlee has an idea.
Flags: needinfo?(ryanvm) → needinfo?(catlee)
Pretty sure nothing has changed on XP in ages.
Flags: needinfo?(catlee)
Is this hidden on treeherder or something?  Brasstacks shows it dropping back down close to zero.
15 automation job failures were associated with this bug yesterday.

Repository breakdown:
* mozilla-inbound: 7
* fx-team: 5
* mozilla-central: 3

Platform breakdown:
* windowsxp: 15

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1274741&startday=2016-05-25&endday=2016-05-25&tree=all
It's back on both inbound and fx-team.

Windows XP opt and pgo M(3) often fail in one of these tests which are scheduled to run after each other:

test_picture_mutations.html https://treeherder.mozilla.org/logviewer.html#?job_id=28915923&repo=mozilla-inbound

test_performance_timeline.html https://treeherder.mozilla.org/logviewer.html#?job_id=28965078&repo=mozilla-inbound

test_performance_now.html https://treeherder.mozilla.org/logviewer.html#?job_id=28965079&repo=mozilla-inbound

test_outerHTML.xhtml https://treeherder.mozilla.org/logviewer.html#?job_id=28964759&repo=mozilla-inbound

(In reply to David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) from comment #6)
> I realize this might just be a new form of bug 1273758.
test_paste_selection.html runs befor most of these tests (but after outerHTML.xhtml) and uses the clipboard.
Summary: 75% permafail on WinXP PGO test_picture_pref.html,test_pointerPreserves3D.html,test_pointerPreserves3DClip.html,test_resource_timing.html | application timed out after 330 seconds with no output → 75% permafail on WinXP test_outerHTML.xhtml,test_picture_mutations.html,test_picture_pref.html,test_pointerPreserves3D.html,test_pointerPreserves3DClip.html,test_resource_timing.html, | application timed out after 330 seconds with no output
18 automation job failures were associated with this bug yesterday.

Repository breakdown:
* mozilla-inbound: 17
* fx-team: 1

Platform breakdown:
* windowsxp: 18

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1274741&startday=2016-05-28&endday=2016-05-28&tree=all
93 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 51
* fx-team: 27
* mozilla-central: 7
* ash: 6
* mozilla-aurora: 2

Platform breakdown:
* windowsxp: 90
* windows7-32: 3

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1274741&startday=2016-05-23&endday=2016-05-29&tree=all
17 automation job failures were associated with this bug yesterday.

Repository breakdown:
* mozilla-inbound: 9
* mozilla-central: 4
* fx-team: 4

Platform breakdown:
* windowsxp: 17

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1274741&startday=2016-05-30&endday=2016-05-30&tree=all
73 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 29
* fx-team: 26
* mozilla-central: 12
* ash: 4
* try: 2

Platform breakdown:
* windowsxp: 72
* windows7-32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1274741&startday=2016-05-30&endday=2016-06-05&tree=all
79 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 37
* fx-team: 24
* mozilla-central: 10
* mozilla-aurora: 6
* autoland: 2

Platform breakdown:
* windowsxp: 79

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1274741&startday=2016-06-06&endday=2016-06-12&tree=all
(Assignee)

Comment 22

a year ago
I recently bisected bug 1273070 as the cause for extremely frequent WinXP e10s DOM mochitest timeouts on Ash as well. I won't file a new bug for it since it looks like the same basic problem as this bug. Hits at least half of the time.

https://treeherder.mozilla.org/logviewer.html#?job_id=22371957&repo=try#L9098

Ben, the hits on Ash are with regular Windows opt builds. You can also run XP mochitest-e10s-3 on Try now without having to do anything special (try: -b o -p win32 -u mochitest-e10s-3[Windows XP]), in case it helps in debugging without having to run PGO.
Flags: needinfo?(bkelly)
(Assignee)

Comment 23

a year ago
In case you want to look at more logs:
https://treeherder.mozilla.org/#/jobs?repo=ash&filter-searchStr=xp%20m-e10s(3)&fromchange=f70b8561b4796217c3328dfc97b61e8ae934c1dd
Ryan, can you try just backing out the test changes in P2 from bug 1273070 in a try push?  I'd like to try to isolate if this is a problem from adding the tests vs the DOM code changes.

I would NI, but you have those turned off. :-)
Flags: needinfo?(bkelly)
Also doing some try runs with full timestamps would be great.  The buffer-and-dump logs hides the timing here which unfortunately seems relevant.
(Assignee)

Comment 26

a year ago
(In reply to Ben Kelly [:bkelly] from comment #25)
> Also doing some try runs with full timestamps would be great.  The
> buffer-and-dump logs hides the timing here which unfortunately seems
> relevant.

I'm don't think there's a way to do that, unfortunately. I'll run some Try pushes to at least isolate which of the two patches from bug 1273070 were at fault, though.
Flags: needinfo?(ryanvm)
(Assignee)

Comment 27

a year ago
BTW, if it ends up being Part 1 that's at fault, it looks like that's not going to backout cleanly at this point. Looks like there's been some significant-looking work that's landed on Fetch.cpp since then.
https://hg.mozilla.org/mozilla-central/log/default/dom/fetch/Fetch.cpp
Flags: needinfo?(ryanvm)
(Assignee)

Comment 28

a year ago
Looks like it was indeed the test changes that are causing this (reverting to rev 5733b66fdedf results in no timeouts) for at least WinXP mochitest-e10s-3. I'll try disabling the test on m-c tip next to hopefully verify.

This of course still begs the question for why a test from one directory is affecting tests in another one given that we're supposed to have a clean Firefox instance between each one. I guess service workers leave things running in the background or something? Are we properly shutting everything down at the end of the fetch tests?
(Assignee)

Comment 29

a year ago
Looks good!
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c154e61c9b4e9fe2d6db9a6fa985c11b9a4756b7

rs=you, Ben?
Flags: needinfo?(bkelly)
Well, I don't want to back it out completely.  Can we do something like this to only disable on windows instead?

    .then(function() {
      // XXX This makes other, unrelated test suites fail. Follow up bug 123.
      let isWin = navigator.platform.indexOf("Win") == 0;
      return isWin ? undefined : nestedWorkerTest();
    })

Because otherwise we have zero test coverage for this particular code.
Flags: needinfo?(bkelly) → needinfo?(ryanvm)
22 automation job failures were associated with this bug yesterday.

Repository breakdown:
* mozilla-inbound: 16
* fx-team: 6

Platform breakdown:
* windowsxp: 22

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1274741&startday=2016-06-18&endday=2016-06-18&tree=all
95 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 55
* fx-team: 32
* mozilla-central: 7
* ash: 1

Platform breakdown:
* windowsxp: 95

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1274741&startday=2016-06-13&endday=2016-06-19&tree=all
(Assignee)

Updated

a year ago
Blocks: 1281212
(Assignee)

Comment 33

a year ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8ee23c78e0211f8932aac785f752550d8ed49cd6&group_state=expanded
Flags: needinfo?(ryanvm)

Comment 34

a year ago
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0edc88aff987
Skip the Fetch nestedWorkerTest on Windows for causing frequent WinXP timeouts in other DOM mochitests. r=bkelly
(Assignee)

Comment 35

a year ago
Please keep an eye on WinXP PGO M(3) over the next few days and confirm that this is indeed resolved by the push above. Try says it works for M-e10s(3) anyway, but I haven't tried PGO. Landed with r=bkelly per IRL discussion in London last week.
Flags: needinfo?(wkocher)
Flags: needinfo?(cbook)
Flags: needinfo?(aryx.bugmail)
Keywords: leave-open
will do, thanks for the head-up
Flags: needinfo?(cbook)

Comment 37

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/0edc88aff987
Flags: needinfo?(wkocher)
Thanks, this looks like fixed on inbound and central.
Status: NEW → RESOLVED
Last Resolved: a year ago
Flags: needinfo?(aryx.bugmail)
Resolution: --- → FIXED
(Assignee)

Comment 39

a year ago
Thanks for the confirmation. I'll get this uplifted to Aurora soonish.
Assignee: nobody → ryanvm
status-firefox48: --- → unaffected
status-firefox50: --- → fixed
Keywords: leave-open
Target Milestone: --- → mozilla50
(Assignee)

Comment 40

a year ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-aurora/rev/1e6a6a67f97b
status-firefox49: affected → fixed
5 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* fx-team: 3
* mozilla-aurora: 2

Platform breakdown:
* windowsxp: 5

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1274741&startday=2016-06-20&endday=2016-06-26&tree=all
You need to log in before you can comment on or make changes to this bug.