If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

[e10s] Increased number of hangs in navigate() since patches from bug 1198381 have landed

RESOLVED FIXED in Firefox 52

Status

Testing
Marionette
RESOLVED FIXED
11 months ago
11 months ago

People

(Reporter: whimboo, Unassigned)

Tracking

({regression})

Version 3
mozilla52
regression
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox51 unaffected, firefox52 fixed)

Details

(URL)

(Reporter)

Description

11 months ago
Investigation on bug 1312633 has been shown that we have an increased number of hangs in our Marionette tests since yesterday. When I had a look at Treeherder it looks like that with the push of the patches on bug 1198381 it got worse.

So we see dead locks for threads like:

1477358366605	Marionette	TRACE	conn383 -> [0,5,"get",{"url":"http://127.0.0.1:49334/javascriptPage.html"}]
--DOMWINDOW == 43 (11D5C400) [pid = 988] [serial = 78] [outer = 00000000] [url = about:blank]
[Child 3760] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file c:/builds/moz2_slave/autoland-w32-d-000000000000000/build/src/toolkit/xre/nsXREDirProvider.cpp, line 1703
[Child 3760] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file c:/builds/moz2_slave/autoland-w32-d-000000000000000/build/src/xpcom/base/nsSystemInfo.cpp, line 116
++DOCSHELL 09F2E800 == 1 [pid = 3760] [id = 1]
++DOMWINDOW == 1 (09F2F000) [pid = 3760] [serial = 1] [outer = 00000000]
++DOMWINDOW == 2 (0ADDA400) [pid = 3760] [serial = 2] [outer = 09F2F000]
[Child 3760] WARNING: site security information will not be persisted: file c:/builds/moz2_slave/autoland-w32-d-000000000000000/build/src/security/manager/ssl/nsSiteSecurityService.cpp, line 268
###!!! ERROR: Potential deadlock detected:
=== Cyclical dependency starts at
--- Mutex : nsThread.mLock (currently acquired)
 calling context
  [stack trace unavailable]

--- Next dependency:
--- Mutex : mozilla.ipc.MessageChannel.mMonitor (currently acquired)
 calling context
  [stack trace unavailable]

=== Cycle completed at
--- Mutex : nsThread.mLock (currently acquired)
 calling context
  [stack trace unavailable]

###!!! Deadlock may happen NOW!


The patch on bug 1198381 has been made some changes to handling of threads. So Andreas, can you please check if that could be the cause? Thanks.
(Reporter)

Updated

11 months ago
Blocks: 1312633

Comment 1

11 months ago
setting n-i for andreas
Flags: needinfo?(afarre)
(Reporter)

Comment 2

11 months ago
It looks like we have this kind of failure for nearly each Mn test job on Windows 7 VM now:

https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=mn%20windows%207%20vm&bugfiler&fromchange=2a52b9538af4eb2605ec80429a84a43665ae2587&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=running&filter-resultStatus=pending&filter-resultStatus=runnable&selectedJob=5620203
Summary: Increased number of hangs in navigate() since patches from bug 1198381 have landed → [e10s] Increased number of hangs in navigate() since patches from bug 1198381 have landed
(Reporter)

Comment 3

11 months ago
Comparing different branches (mozilla-central, autoland, mozilla-inbound, and fx-team) they seem to all show a similar behavior. Whereby for some the changes haven't been tested yet, so the Mn-e10s jobs are still green. We will know more soon.

Carsten backed out the patch on mozilla-central and will merge it around for other integration branches. So shall we keep this bug open or close it given that the offending patch is no longer present?
Blocks: 1198381
Keywords: regressionwindow-wanted
Whiteboard: [regression from bug 1198381?]

Comment 4

11 months ago
If we can see that the erroneous behaviour disappears then we can close it, and I'll keep it in mind when fixing 1198381 instead.
Flags: needinfo?(afarre)
(Reporter)

Comment 5

11 months ago
All newer landed changesets have passing Mn-e10s tests for Windows 7 VM debug. So the backout on bug  1312683 actually fixed his problem.
Status: NEW → RESOLVED
Last Resolved: 11 months ago
status-firefox51: --- → unaffected
status-firefox52: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
(Reporter)

Comment 6

11 months ago
(In reply to Henrik Skupin (:whimboo) from comment #5)
> All newer landed changesets have passing Mn-e10s tests for Windows 7 VM
> debug. So the backout on bug  1312683 actually fixed his problem.

Well, I meant bug 1198381.
(Reporter)

Updated

11 months ago
No longer blocks: 1312624
(Reporter)

Updated

11 months ago
No longer blocks: 1312629

Comment 7

11 months ago
20 automation job failures were associated with this bug yesterday.

Repository breakdown:
* autoland: 12
* mozilla-inbound: 4
* mozilla-central: 4

Platform breakdown:
* windows7-32-vm: 20

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1312683&startday=2016-10-25&endday=2016-10-25&tree=all

Comment 8

11 months ago
22 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* autoland: 14
* mozilla-inbound: 4
* mozilla-central: 4

Platform breakdown:
* windows7-32-vm: 22

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1312683&startday=2016-10-24&endday=2016-10-30&tree=all
You need to log in before you can comment on or make changes to this bug.