Open
Bug 1278588
Opened 8 years ago
Updated 2 years ago
ASSERTION: Potential deadlock detected (ObservedDocShell::ClearMarkers(), locking mOffTheMainThreadTimelineMarkers)
Categories
(Core :: DOM: Navigation, defect, P3)
Core
DOM: Navigation
Tracking
()
NEW
Tracking | Status | |
---|---|---|
firefox50 | --- | affected |
People
(Reporter: MatsPalmgren_bugz, Unassigned)
References
Details
(Whiteboard: btpp-followup-2016-06-14)
Per comment bug 1258079 comment 3: (In reply to Randell Jesup [:jesup]) > Matt (sic) - that's a totally different bug; it's deadlocking in > ObservedDocShell::ClearMarkers(), locking mOffTheMainThreadTimelineMarkers. > Please file a bug in that component Log snippet with stack: https://bug1258079.bmoattachments.org/attachment.cgi?id=8736401 (note that it happened a while ago, on 2016-03-30)
Updated•8 years ago
|
Flags: needinfo?(ttromey)
Updated•8 years ago
|
Whiteboard: btpp-followup-2016-06-14
Comment 1•8 years ago
|
||
The last time a deadlock came up here was: https://bugzilla.mozilla.org/show_bug.cgi?id=1253516 IIRC during that investigation :fitzgen said that other code paths looked questionable from a locking perspective. I'll try to take a look at it soon.
Comment 2•8 years ago
|
||
I'm going to try to solve this one in bug 1283887; but if not I'll handle it here.
Assignee: nobody → ttromey
Status: NEW → ASSIGNED
Flags: needinfo?(ttromey)
Comment 3•8 years ago
|
||
I haven't been able to reproduce this. I thought it might be worthwhile to spell out what I did, for double-checking. After looking at the stack trace and reading the informative comment in xpcom/glue/DeadlockDetector.h, my understanding is that the bug is potential deadlock due to inconsistent lock ordering. In particular, sometimes the order is first sMutex and then the observed docshell mutex; and sometimes the reverse. First, I did a debug build, to enable the deadlock detection. I made sure the deadlock detection was working by running mach cppunittest and looking for the deadlock detection test. Then, I ran the dom/animation/test/chrome/test_restyles.html. I ran it many times (e.g., with --repeat 30 and with --repeat 100) and both with and without chaos mode (using "MOZ_CHAOSMODE=0"). This never showed the warning in question. Since the original log doesn't include the stack traces of the offending mutex acquisitions, it's hard to know the "reversed" spot. One suspect is the deadlock found in bug 1253516. Mats - can you still reproduce and if so, how?
Flags: needinfo?(mats)
Reporter | ||
Comment 4•8 years ago
|
||
No, I haven't seen this assertion since that one occurrence.
Flags: needinfo?(mats)
Comment 5•8 years ago
|
||
We've started to seeing this assertion again only on Mac OSX 10.10. See bug 1244897 comment 18. Tom, any suspicions around the starting day ?
Flags: needinfo?(ttromey)
Comment 6•8 years ago
|
||
It seems to me that the bug could occur if ObservedDocShell::PopMarkers causes a GC. At this point, the ObservedDocShellMutex would be held. Then the GC can call CycleCollectedJSContext::GCNurseryCollectionCallback to add a marker. This will call TimelineConsumers::AddMarkerForAllObservedDocShells, locking sMutex. However, many other calls lock sMutex first before acquiring ObservedDocShellMutex. This is just a theory because the logs only have stack traces for one of the mutex acquisition paths. There is some code in ObservedDocShell to avoid pushing markers that were created during the course of popping. One idea for a fix might be to move this logic to TimelineConsumers. Is there a reliable way to reproduce this bug?
Flags: needinfo?(ttromey) → needinfo?(hiikezoe)
Comment 7•8 years ago
|
||
(In reply to Tom Tromey :tromey from comment #6) > Is there a reliable way to reproduce this bug? No, I've never seen this assertion locally. I thought this assertion has been fixed because we hadn't seen the failure (bug 1244897) for six months.
Flags: needinfo?(hiikezoe)
Comment 8•7 years ago
|
||
as this is is blocking bug 1244897 which is one of our higher frequency intermittents, I would like to understand the next steps of this bug.
Comment 9•7 years ago
|
||
(In reply to Joel Maher ( :jmaher) from comment #8) > as this is is blocking bug 1244897 which is one of our higher frequency > intermittents, I would like to understand the next steps of this bug. It's been on the back burner both due to lack of time and also my perception that the bug was very intermittent and not reproducible. If it's more intermittent then I can make it a higher priority.
Comment 10•7 years ago
|
||
I just realized that there is the possibility that bug 1305325 causes this dead lock. And I guess it will be fixed by bug 1324966.
Updated•7 years ago
|
Priority: -- → P3
Comment 11•7 years ago
|
||
Docshell markers are going to be removed in bug 1421651. Maybe I should close this as wontfix, but meanwhile marking the dependency and dropping it.
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•