Closed Bug 1191145 Opened 4 years ago Closed 4 years ago

Stop blocking scripts when handling IPC messages

Categories

(Core :: IPC, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla44
Tracking Status
firefox42 --- affected
firefox44 --- fixed
firefox47 --- fixed

People

(Reporter: billm, Assigned: billm)

References

Details

Attachments

(1 file)

Attached patch patchSplinter Review
We added the script blocking to handle the case where a CPOW caused the child process to run a nested event loop somehow. That would cause a deadlock. Now we handle this case by canceling the CPOW, so I think there's no reason to have a script blocker (which often was ineffective anyway).

I think we should remove this code since it's pretty flaky. We sometimes run scripts even if there's a script blocker on the stack. We also print all sorts of NS_ASSERTIONs about this during tests, which everyone just ignores.
Attachment #8643390 - Flags: review?(dvander)
I really don't like this. We're trading off some amount of addon compatibility with the certainty that at some point web script will enter a nested loop and the addon will experience inconsistent behavior and unexpected exceptions.

I'm also really scared about "we print all sorts of NS_ASSERTIONs about this during tests". I thought this was going to be a fatal error, and if we're experiencing this in our own automated tests without addons, there's clearly something seriously wrong that we should fix instead of ignoring-via-thrown-exceptions.
Do we also cancel CPOWs when content script attempts to enter plugin code? That is equally dangerous in terms of deadlocks or other out-of-sync behavior.
Bill, and Benjamin, did you reach a conclusion on this topic?  I'm asking because David is the reviewer, and the code that implements the above described intent is trivially correct, but I'm worried about whether we agree about the intent itself.
Flags: needinfo?(wmccloskey)
Flags: needinfo?(benjamin)
No, we reached no conclusion. I still oppose this change in general, but we haven't talked about it again.
Flags: needinfo?(benjamin)
Sorry this too me so long. I've been busy and also avoiding this.

Here's the situation we're in now (without this patch):

When we dispatch a CPOW, we put a script blocker on the stack. If we try to run content JS during this time, three things can happen:
1. We delay running the JS code until after the CPOW has been processed. This is the ideal case.
2. We never run the JS code.
3. We run the JS code even though we aren't supposed to and we print a warning.

Additionally, we used to crash if you entered content JS code during a CPOW, but that made Firefox completely unusable with some add-ons installed, so I backed it out a long time ago (bug 950745). The main issue was add-ons that call dispatchEvent on some content DOM node.

There have never been any NPAPI checks. I asked you how to implement them in bug bug 950745 and you didn't respond. I'm not sure what they would be checking for. Plugins don't do synchronous communication with the chrome process, so they would have to call back into the content process to trigger problems, and I assume that wouldn't be any different than content JS at that point.

In any of the cases above where we do run content JS during a CPOW and it triggers a nested event loop, we cancel the CPOW.

In my opinion, this patch makes our behavior more consistent. Rather than randomly block some JS and not other JS, we allow it to run and rely on cancellation to avoid deadlock.

It sounds like your objection is to the idea of cancellation, since it means the add-on will sometimes work and not other times (like on pages that do sync XHRs). I think we just have a fundamental disagreement about what CPOWs are for. They're there because I think most users would rather have a sorta working add-on than no add-on. We're being pretty up front about the fact that CPOWs are terrible. Very few add-on developers that I know of are using them intentionally.

(In reply to Benjamin Smedberg  [:bsmedberg] from comment #1)
> I'm also really scared about "we print all sorts of NS_ASSERTIONs about this
> during tests". I thought this was going to be a fatal error, and if we're
> experiencing this in our own automated tests without addons, there's clearly
> something seriously wrong that we should fix instead of
> ignoring-via-thrown-exceptions.

The assertions happen because we're running content JS when there is a script blocker on the stack (case 3 above). That works fine as long as there's no nested event loop involved.

Again, this is about getting something to work. I would rather have these tests in the state they're in than not have them. Eventually, we'll rewrite them using BrowserTestUtils or whatever, but until then, they serve a purpose.

This patch will allow us to get more tests working since it avoids case 2 above, which usually breaks tests.
Flags: needinfo?(wmccloskey)
Attachment #8643390 - Flags: review?(dvander) → review+
https://hg.mozilla.org/mozilla-central/rev/344cb85a558f
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
Backed out to see if it's related to a topcrash.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
https://hg.mozilla.org/mozilla-central/rev/b237f1bffcfd
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Blocks: 1239767
Duplicate of this bug: 1239767
Duplicate of this bug: 1244921
You need to log in before you can comment on or make changes to this bug.