Closed Bug 916753 Opened 11 years ago Closed 11 years ago

Deadlock on the operation callback lock?

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla27
Tracking Status
firefox26 + fixed
firefox27 --- fixed

People

(Reporter: bzbarsky, Assigned: bhackett1024)

References

Details

(Whiteboard: [qa-])

Attachments

(2 files)

I got up this morning to Firefox being frozen.  When I sampled it, all mainthread samples were under __psynch_mutexwait when acquiring the lock in JSRuntime::triggerOperationCallback.

The attachment has the stacks for all threads.

I assume we somehow deadlocked on this lock...
Attached file Stacks
This could be the same issue as bug 916531... Unfortunately I don't see a useful stack there so it's hard to say for sure.
And indeed.  The mainthread stack has CodeGenerator::link (which locks the operation callback lock) calling newCode (done while the lock is held), which ends up allocating an arena, which ends up calling TriggerZoneGC (inlined?) which does TriggerOperationCallback (also inlined?).

The AutoLockForOperationCallback in link() was added in bug 864220, looks like.
Blocks: 864220
Thanks Boris, requesting needinfo from bhackett to make sure we don't forget about this.
Flags: needinfo?(bhackett1024)
Bug 916531 does have stacks.  See the breakpad incidents, and show all threads.  They look unrelated to this issue.
Attached patch patchSplinter Review
We need to prevent the main thread from trying to reenter the operation callback lock, and in this case can't really do so when allocating the IonCode as that allocation happens in the same call that allocates the code memory from the Ion executable allocator (accesses to which need to be protected by the operation callback lock).  The cleanest solution seems to be to make rt->currentThreadOwnsOperationCallbackLock() available to release builds, and just avoid triggering GCs when allocating in such cases.
Assignee: general → bhackett1024
Attachment #805311 - Flags: review?(jdemooij)
Flags: needinfo?(bhackett1024)
Comment on attachment 805311 [details] [diff] [review]
patch

Is PR_GetCurrentThread cheap enough or AutoLockForOperationCallback rare enough?
(In reply to Boris Zbarsky [:bz] from comment #7)
> Comment on attachment 805311 [details] [diff] [review]
> patch
> 
> Is PR_GetCurrentThread cheap enough or AutoLockForOperationCallback rare
> enough?

Both.
Attachment #805311 - Flags: review?(jdemooij) → review+
Comment on attachment 805311 [details] [diff] [review]
patch

Review of attachment 805311 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/Runtime.h
@@ -737,1 @@
>              rt->operationCallbackOwner = PR_GetCurrentThread();

Can we JS_ASSERT(!rt->currentThreadOwnsOperationCallbackLock()) here, to turn this deadlock into an assertion failure next time?
https://hg.mozilla.org/mozilla-central/rev/4bcf9b261b94
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
Depends on: 918862
According to bug 918862, this bug supposedly caused some regressions:
- Inbound-non-pgo: TART (tab animations) on Ubuntu (3.4%), Ubuntu 64 (5.8%), TP5-optimized win8 (2.7%), Paint win8 (4.2%).
- Aurora (pgo?): TART Ubuntu 64 (9.4%), Ubuntu (5%), win XP (2.7%).

- Are these expected from this bug or can be explained in retrospective?
- Is this a legitimate price to pay for this bug? If not, can the cost be reduced?
Flags: needinfo?(bzbarsky)
> - Are these expected from this bug or can be explained in retrospective?

You're asking me?  Shouldn't you be asking Brian?
Flags: needinfo?(bzbarsky)
Flags: needinfo?(bhackett1024)
Flags: needinfo?(bhackett1024)
Whiteboard: [qa-]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: