Closed Bug 1315803 Opened 8 years ago Closed 2 months ago

don't let postMessage() flood the main thread

Categories

(Core :: DOM: postMessage, defect, P3)

defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: bkelly, Unassigned)

References

(Blocks 2 open bugs)

Details

(Whiteboard: dom-lws-bugdash-triage)

Attachments

(4 obsolete files)

Attached patch wip (obsolete) — Splinter Review
This builds on the timer ThrottledEventQueue work in bug 1300659. https://treeherder.mozilla.org/#/jobs?repo=try&revision=c36464ef409498f90ad4ee2b4c232137086b561e The back pressure part of this patch is not quite right. I may not implement back pressure immediately since we never stop message events today.
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
One thing people have asked is as fast postMessage (between windows) as possible, not being blocked but other stuff. Just something to keep in mind.
Priority: -- → P3
Assignee: ben → nobody
Status: ASSIGNED → NEW
Component: DOM → DOM: Core & HTML
Blocks: eviltraps

It's very similar to bug 1514413, this one is postMessage, that bug is Http request. If the parent gets flooded by those events, the browser will completely hang, some extreme case may even force user to reboot the machine.

Chrome doesn't have this problem.

The existing two dependencies were resolved. What work is left here?

Component: DOM: Core & HTML → DOM: postMessage

When trying to apply the patch to the current head revision, I found that

The last point removes the ability to directly put the ThrottledEventQueue, but this change might be reverted.

Andrew, could you provide some guidance if implementing this still makes sense? I can supply a rebased patch if this makes answering this easier.

Flags: needinfo?(bugmail)
Assignee: nobody → sgiesecke
Status: NEW → ASSIGNED

I now adapted the patch, split up into 3 patches, together with some cleanup, so maybe comment on Phabricator directly.

(In reply to Simon Giesecke [:sg] [he/him] from comment #4)

Andrew, could you provide some guidance if implementing this still makes sense? I can supply a rebased patch if this makes answering this easier.

I think we should abandon the patchset and close the bug in favor of other bugs that are about the problem than a patch holding bug. This is a real problem, but is likely best considered at a higher level more comprehensively as part of the various scheduler changes coming up and including the performance team. I'm planning to attend https://berlinallhandsjanuary2020.sched.com/event/YwoR/workshop-scheduler-design-update-and-next-steps where I'm interested in understanding if that work will interact with this problem space.

In particular:

  • Various worker debugger changes have been happening to better support worker debugging and I wouldn't want to land random changes and risk complicating things for :bhackett.
  • Throttling that doesn't actually provide some variant of backpressure or throttling (ex: token bucket) results in a memory leak, which isn't necessarily an improvement, and there's no analysis on this here. (Which is part of my prior paragraph point.)
Flags: needinfo?(bugmail)
Attachment #8808388 - Attachment is obsolete: true
Attachment #9116999 - Attachment is obsolete: true
Assignee: sgiesecke → nobody
Status: ASSIGNED → NEW
Attachment #9117001 - Attachment is obsolete: true
Attachment #9117000 - Attachment is obsolete: true

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #9)

I think we should abandon the patchset and close the bug in favor of other bugs that are about the problem than a patch holding bug. This is a real problem, but is likely best considered at a higher level more comprehensively as part of the various scheduler changes coming up and including the performance team.

Hi Andrew, to be more concrete: Do we have a bug we can dupe this on then?

Flags: needinfo?(bugmail)
Severity: normal → S4

(In reply to Jens Stutte [:jstutte] from comment #10)

Hi Andrew, to be more concrete: Do we have a bug we can dupe this on then?

No, we would need to file new bugs. And this would also entail figuring out what we want our policy to be. This may be better spun off in favor of a product-level decision/investigation into whether there are cases where we should be throttling workers or the impact of workers. There's real potential for cross-browser compat issues here.

That said, e10s and Fission help alleviate much of the original concern here. Specifically:

  1. e10s means that workers aren't harming the general responsiveness as the browser as a whole because the parent process's main thread is not interfered with.
  2. Fission means thats the performance implications of spammed postMessage on windows should only be meaningfully impact that origin's main thread.
    • That said, for BroadcastChannel (which this patch touched) and MessageChannel (related), those messages still do get routed through the PBackground thread so there is some potential concern there. But this patch did nothing to address that.
Flags: needinfo?(bugmail)

Restating:

  • :bkelly's changes to BroadcastChannel and Window(Proxy).postMessages in the patch were to use ThrottledEventQueue or otherwise defer delivery of postMessage in order to avoid having the postMessage tasks dominating the main-thread run loop.
  • This is a tricky scenario because doing something like this can potentially result in OOM without a capability to throttle the source of the postMessage calls. This stack did nothing to throttle the source; we only have such a mechanism in place for Workers using WorkerGlobalScope.postMessage, and it does accomplish that. This makes this a much harder problem.
  • As I said in comment 11, in e10s and fission have largely mitigated concerns related to this.

From triage discussion:

  • We are overhauling MessagePort in bug 1752287 which creates some new potential for backpressure, or at least in terms of allowing us to ensure that the OOM could be localized to the sender which is failing to utilize backpressure.
  • We are investigating / working on worker throttling in bug 1818463 that could potentially integrate with backpressure signals, and this could potentially be hooked up to the changes above. But this would be done as new follow-ups.
  • It would be desirable to have new real-world scenarios / profiling that shows that we are experiencing real problems here.
Status: NEW → RESOLVED
Closed: 2 months ago
Resolution: --- → INCOMPLETE
Whiteboard: dom-lws-bugdash-triage
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: