Closed Bug 1453007 Opened 6 years ago Closed 5 years ago

Frequent T-P-e10s(g2) failures on Linux and OS X: Task killed because maxRunTime was exceeded / Automation Error: mozprocess timed out after 3600 seconds running / damp | webconsole/openwithcache.js: Test timed out

Categories

(Testing :: Talos, defect)

Version 3
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: aryx, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: intermittent-failure, regression, Whiteboard: [stockwell disabled])

Attachments

(1 file)

Hmmm

08:10:02     INFO -  PID 1312 | TEST-UNEXPECTED-FAIL | damp | webconsole/openwithcache.js: Test timed out

:ochameau, as the owner of DAMP, can you please investigate? Thanks!
Flags: needinfo?(rwood) → needinfo?(poirot.alex)
Note that thanks to bug 1444826, we get much better error logged into PerfHerder.
So that we can file specific bugs like this one!

Now let's see what going on with this console test...
(In reply to Sebastian Hengst [:aryx][:archaeopteryx] (needinfo on intermittent or backout) from comment #0)
> There are more webconsole/openwithcache.js failures available in this
> central-as-beta simulation:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=41356ce53ff617857225cedc97b0254ebde4dd9f&group_state=e
> xpanded&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-
> resultStatus=exception&filter-resultStatus=retry&filter-
> resultStatus=usercancel&filter-resultStatus=runnable&selectedJob=172833082

Is it MacOS only?
It seems to be passing on Linux, it fails later, during tps (which is run after DAMP), due to bug 1443130.

(In reply to Sebastian Hengst [:aryx][:archaeopteryx] (needinfo on intermittent or backout) from comment #2)
> The failures start with
> https://hg.mozilla.org/mozilla-central/
> pushloghtml?changeset=db03e2cdae9ab05482f98e3c567896234c66e226

Could it be that bug 1448077 breaks something only on MacOS?
Flags: needinfo?(poirot.alex)
Thanks :ochameau.

Yes definitely looks like two different issues here,

1) this issue, 'TEST-UNEXPECTED-FAIL | damp | webconsole/openwithcache.js: Test timed out', which yes seems to only happen on OSX and on the talos profiling nightly job. So an issue with running g2 with --geckoProfiling on, on OSX.

Yes possibly bug 1448077 since that is also in the changeset. :julian, can you take a peak please and see if it's related to this new failure? Thanks!]

And,

2) g2 exceeding the total task run time, which is Bug 1443130, and happens in the nightly talos profiling job

Which we will resolve in that bug by extending the max task runtime, and I"ll make a patch in that bug.
Flags: needinfo?(jdescottes)
For now I can't reproduce the issue locally, and I also don't understand why Bug 1448077 would make it fail.

I just added a few g2 with geckoprofile on the last try push I did before landing:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=fed11a87970988ef70ba86da3ec24238f923c91f&selectedJob=172929836

They are all green, so maybe it's something else in the changelog?  

Here is an extract from a failure log (from in https://treeherder.mozilla.org/#/jobs?repo=try&revision=41356ce53ff617857225cedc97b0254ebde4dd9f&selectedJob=172862156)

04:12:28     INFO -  PID 579 | Executing test 'webconsole/openwithcache.js'
04:12:28     INFO -  PID 579 |
[.... Removed some lines here to make it easier to read]
04:12:28     INFO -  PID 579 |
04:12:29     INFO -  PID 579 | Assertion failure
04:12:29     INFO -  PID 579 | assert@resource:///modules/AsyncTabSwitcher.jsm:447:36
04:12:29     INFO -  PID 579 | finish@resource:///modules/AsyncTabSwitcher.jsm:300:5
04:12:29     INFO -  PID 579 | postActions@resource:///modules/AsyncTabSwitcher.jsm:609:7
04:12:29     INFO -  PID 579 | onUnloadTimeout@resource:///modules/AsyncTabSwitcher.jsm:623:5
04:12:29     INFO -  PID 579 | queueUnload/this.unloadTimer<@resource:///modules/AsyncTabSwitcher.jsm:925:44
04:12:29     INFO -  PID 579 | Unable to read VR Path Registry from /Users/cltbld/Library/Application Support/OpenVR/.openvr/openvrpaths.vrpath
04:12:29     INFO -  PID 579 | 2018-04-10 04:12:29.286 plugin-container[637:11985] *** CFMessagePort: bootstrap_register(): failed 1100 (0x44c) 'Permission denied', port = 0x9237, name = 'com.apple.tsm.portname'
04:12:29     INFO -  PID 579 | See /usr/include/servers/bootstrap_defs.h for the error codes.
04:17:28     INFO -  PID 579 | TEST-UNEXPECTED-FAIL | damp | webconsole/openwithcache.js: Test timed out

It mentions AsyncTabSwitcher, which seems to have been updated in Bug 1451460, which is also in the pushlog.
And this pattern seems to repeat itself for all the failures that mention openwithcache.

Mike, do you think this failure could be linked to one of your patches?
Flags: needinfo?(jdescottes) → needinfo?(mconley)
Try after Bug 1451460 : https://treeherder.mozilla.org/#/jobs?repo=try&revision=f50dcba03c34cccc89a477a5510982a836949ceb
Try before Bug 1451460: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d153c14f7b7d2b682bef956acfafd3041e0e2bb2

So it looks like the timeouts are linked to Bug 1451460. But interestingly they all failed after a very long time, and I didn't see any of the "damp | webconsole/openwithcache.js: Test timed out" that can be found on the other runs mentioned in this bug.
Comment on attachment 8967442 [details] [diff] [review]
skip tps on osx, for profiling skip tps on linux/osx

Review of attachment 8967442 [details] [diff] [review]:
-----------------------------------------------------------------

R+ (with the landing of the patch in Bug 1453720 first of course)
Attachment #8967442 - Flags: review?(rwood) → review+
Depends on: 1453720
Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5e672478d592
skip tps on osx and linux/osx for profiling. r=rwood
Keywords: leave-open
Whiteboard: [stockwell disabled]
Sorry it's taken me so long to get to this. :(

Yeah, we're failing an assertion in the AsyncTabSwitcher, and that would definitely cause the test to time out. We should try to figure out why the assertion is failing.
Flags: needinfo?(mconley)
ni?ing myself to see if I can get some logging in here to see what's going on.
Flags: needinfo?(mconley)

No reports for 8 months.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME

Bugbug thinks this bug is a regression, but please revert this change in case of error.

Keywords: regression

Apologies - declaring needinfo bankruptcy on needinfo's greater than 2 years old.

Flags: needinfo?(mconley)
See Also: → 1873527
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: