Closed Bug 1239767 Opened 8 years ago Closed 8 years ago

[e10s] Periodic deadlock between content process & chrome process, with NoScript

Categories

(Core :: IPC, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1191145
Tracking Status
e10s + ---

People

(Reporter: dholbert, Unassigned)

References

Details

Attachments

(2 files)

I've been trying running with e10s enabled and NoScript recently, and I've been experiencing periodic whole-browser lockups, with very low CPU usage.  These seem to go away if I disable e10s or turn off NoScript.

When I run into a hang like this, I usually give it ~30 seconds to resolve itself, and then I kill the content process manually (with "kill -11" to generate a crash report, to see where it's hanging). This produces crash stacks like these:
bp-4c737c83-41fe-43cc-8d68-f65ec2160114
bp-493595f8-deb8-4cd5-907a-d9b2f2160114

If you scroll down far enough, you'll see that this content-process hang is inside of nsContentPolicy::CheckPolicy, which calls into script, which calls something that tries to send a synchronous message to the Chrome process, and ends up getting stuck in mozilla::CondVar::Wait.  If I generate a chrome-process crash report during the hang, it's also stuck in  mozilla::CondVar::Wait: bp-0cad7b56-c47f-45a4-93bc-9d1ff2160105

So, we're effectively deadlocked, with each proces in mozilla::CondVar::Wait.

I think Bill's been running with NoScript to see if he can trigger this. Giorio, I'm curious if you've hit this as well.

Not sure whether this should be considered a Tech Evang|Add-ons issue or a Core issue. Starting it as the former for now.
Component: Desktop → Add-ons
I don't have concrete STR, unfortunately. But when I hit this today (twice in a row), I had just selected NoScript's "Temporarily allow scripts" for a particular domain on a particular site.  Subsequent visits to the same site triggered the hang as well, but only in my main browsing profile. (Not in a fresh profile w/ NoScript installed.)

Here's another chrome-process crash report during the hang (just generated now) -- this one's more interesting than the one in comment 0, as it's got some addon stuff in the backtrace -- xpc::AddonWrapper<js::CrossCompartmentWrapper>::set(...), at backtrace level 13:   bp-5b432262-7a54-4fed-b890-c2d5d2160114
Hmm, my STR from comment 1 do seem to reproduce this quite reliably, *in my normal browsing profile* at least. (not in a fresh profile)

The site in question is http://www.gofreebies.com/deactivateEmail.asp [1].  My current STR (for my browsing profile) are to visit that site, and then tell NoScript to temporarily allow scripts from that domain (gofreebies.com).

[1] (a service which another Mr./Mrs. "D. Holbert" annoyingly signed up for, using a "dholbert" email address of mine, by accident presumably)
Thank you Daniel.

Given the stack trace alone, the culprit is pretty obviously the nsIContentPolicy implementation going through the shims (whether they're supposed to deadlock or not, but I suppose they are because I've several "unsafe CPOW usage" instances). 
The problem will certainly go away as soon as this component migrates entirely into the child process, something which is definitely going to happen within a month.
Assignee: nobody → g.maone
Ah, nice! Maybe I won't bother digging into this much further then -- but I'll save a snapshot of my current profile, so I can verify that this is fixed in the future by using my STR from comment 1.

I'll make a note on my calendar to check back here in a month (but if you make the changes that you expect to fix this sooner than that, please feel free to ping me & I'll re-test).  In the meantime, I'm going to disable e10s.

(One further note on my STR: it looks like they're not actually 100% reliable; it depends on there being some background activity, I think (many other tabs open), and even then it sometimes takes a few tries to reproduce.  So, there's a race condition of some sort, and if the race turns out the wrong way, there's a deadlock.)
Byron can you add the tracking-e10s flag to this component?
Flags: needinfo?(glob)
(In reply to Brad Lassey [:blassey] (use needinfo?) from comment #5)
> Byron can you add the tracking-e10s flag to this component?

done
Flags: needinfo?(glob)
(Bill suspected this might've become fixed [in the platform] by bug 1240985, but I was just able to reproduce it with current Nightly: bp-07f0253a-3080-4e5d-9fa1-bd9c62160201 )
One more data-point -- when we get into this situation:
 - The child process is hanging in PR_WaitCondVar.
 - The parent process is hanging in the while (true) loop at the bottom of mozilla::ipc::MessageChannel::Send.


(If I randomly make GDB break in the parent process with Ctrl-C, it's usually inside of PR_WaitCondVar which is inside of CondVar::Wait. So, that's where the parent process is spending most of its time. But I can "fin" out of those functions, up to ::Send, at which point I can't "fin" out of the function.)
Attached file nspr log for parent
Here's a NSPR log, generated from a debug build and my profile & str in comment 1 - 2. Used this command for logging:
NSPR_LOG_FILE=/tmp/nspr_log.txt NSPR_LOG_MODULES=ipc:5,sync

When we start hanging, the NSPR log just repeats these same lines over and over:
[0x7f6d2cc964a0]: D/ipc ProcessPendingRequests for seqno=3371, xid=3371
[0x7f6d2cc964a0]: D/ipc ShouldDeferMessage(seqno=-254) = 1
Attached file nspr log for child
Attachment #8714522 - Attachment description: nspr log → nspr log for parent
I'm unable to reproduce this in an up-to-date mozilla-inbound debug build (though I was able to reproduce using a debug build yesterday).

Given Bill's assertion in bug 1244921, it sounds like this was fixed by bug 1191145.

I'm going to reclassify this as an IPC bug, and dupe it to the bug that (I think) fixed it -- bug 1191145. (Right now this is classified as an add-ons tech-evang bug, since it only reproduced with NoScript; but it seems NoScript was just triggering an underlying platform issue here.)
Assignee: g.maone → nobody
Status: NEW → RESOLVED
Closed: 8 years ago
Component: Add-ons → IPC
Depends on: 1191145
Product: Tech Evangelism → Core
Resolution: --- → DUPLICATE
Version: unspecified → Trunk
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: