Open Bug 2008615 Opened 9 days ago Updated 13 hours ago

postMessage during visibiltychange with document.visibilityState == 'hidden' is dropped

Tracking

()

Status:

ASSIGNED

People

(Reporter: jrmuizel, Assigned: asuth)

References

(Blocks 1 open bug)

Details

(Keywords: webcompat:platform-bug)

User Story

user-impact-score:360

Jeff Muizelaar [:jrmuizel]

Reporter

Description

•

9 days ago

See the test case from bug 2004106: https://bugzilla.mozilla.org/attachment.cgi?id=9535715

STR:

Load the testcase from a server other than bugzilla. (Bugzilla has cache headers that prevent BFCache from working)
Click the button to make noticing whether BFCache is being used more obvious
Click the wikipedia link
Go back
Observe the output

Expected:

Page startup
Received message: Hello from port1!
Button clicked
Document is now hidden
Sending message via the port
Document is now visible
Received message: port message is hidden

Actual:

Page startup
Received message: Hello from port1!
Button clicked
Button clicked
Document is now hidden
Sending message via the port
Document is now visible

Jeff Muizelaar [:jrmuizel]

Reporter

Updated

•

9 days ago

Blocks: 2004106

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 1

•

9 days ago

We should probably make sure we don't drop the messages and just queue them instead.

Chrome seems to have a limit of 1 million messages. (I don't know where this is implemented). See https://bugzilla.mozilla.org/show_bug.cgi?id=2004106#c21

Olli, where's the right place to implement this?

BugBot [:suhaib / :marco/ :calixte]

Updated

•

9 days ago

Keywords: webcompat:platform-bug

BugBot [:suhaib / :marco/ :calixte]

Updated

•

8 days ago

User Story: (updated)

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 2

•

7 days ago

:smaug, :zcorpan, I have some questions about this non-normative text in regards to the port message queue I reference below; I was wondering if you could (briefly) lend your understanding; I don't think the non-normative text is supported by spec. At least as I'm reading this, it sounds like we'd somehow buffer the messages, but maybe the phrasing is just bad and it means to convey that the not-fully-active listeners will just miss (and never see) any messages received while they're not-fully-active?

If the document is fully active, but the event listeners were all created in the context of documents that are not fully active, then the messages will not be received unless and until the documents become fully active again.

Here's my understanding of the situation:

Our MessagePort implementation currently induces bfcache eviction upon receipt of messages as well as when postMessage is invoked (which potentially matters for iframes).
Spec-wise, unlike BroadcastChannel which has an explicit eligible for messaging check which requires that "a Window object whose associated Document is fully active" and is used both to check the postMessage caller is eligible and then to filter the destination set (which notably currently is not done in parallel...)...
There also has been some very interesting discussion about BroadcastChannel and BFCache on https://github.com/whatwg/html/issues/7253 where Chrome has moved towards allowing pages using BroadcastChannel into BFCache but evicting on-message and that's the proposed standardization.
The message port post message steps which is the core of the MessagePort.postMessage() method defines a task that operates in terms of the port message queue which is a task source that exists to handle the fact that message ports can be shipped/transferred repeatedly and even while messages are in flight. The transfer-receiving steps handles moving the tasks.
- The spec-language around tasks and bfcache really centers on the concept of tasks being runnable where "A task is runnable if its document is either null or fully active." The event loop only runs task queues with at least one runnable task and then explicitly filtering to only consider runnable tasks. This means tasks indeed should either be buffered or the document evicted from bfcache, for which there's a lot of implementation latitude historically, but obviously webcompat can be at odds with implementation latitude.
- There's a non-normative around the port message queue that makes the very interesting assertion that "If the document is fully active, but the event listeners were all created in the context of documents that are not fully active, then the messages will not be received unless and until the documents become fully active again." which doesn't appear to be supported by spec.
  - This text most recently changed in https://github.com/whatwg/html/pull/7694 (Remove "responsible document" concept) as part of https://github.com/whatwg/html/issues/4335 ("Consider removing settings object's responsible document") with this assertion having previously existed but being updated. The previous update is more interesting as it changed "then the messages will be lost" to the current phrasing. This changed in https://github.com/whatwg/html/pull/5728 (Change MessagePort owner from incumbent to current) marked as closing https://github.com/whatwg/html/issues/4340 (Is the MessagePort's owner concept really necessary?) and helping with https://github.com/whatwg/html/issues/1430 (https://github.com/whatwg/html/issues/1430).
  - My specific spec concern is that the spec is very clear that the tasks are associated with the document/window that the MessagePort was transferred into and event dispatch is synchronous and does not, to my understanding, magically create forked tasks. Additionally, although HTML has a check if we can run script algorithm that returns "do not run" if " the global object specified by settings is a Window object whose Document object is not fully active", that's only used in run a classic script and run a module script. AFAICT DOM Event dispatch uses invoke which uses inner invoke which uses WebIDL's call a user object's operation which uses prepare to run script and prepare to run a callback, then uses TC39's IsCallable which we pass, then we get into TC39 Call and directly uses the [[Call]] slot.
  - I'm not immediately able to find spec support for it (I would believe WebIDL might decorate the call slots, but I don't see text for that; I could also believe TC39 has a runtime hook, but I didn't briefly find that either), but our mozilla::dom::CallSetup::CallSetup method calls into mozilla::dom::CallSetup::GetActiveGlobalObjectForCall which calls HasActiveDocument which in turn calls nsPIDOMWindowInner::HasActiveDocument which calls nsPIDOMWindowInner::IsCurrentInnerWindow which will return false for non fully-active windows which should then fail CallSetup via the ErrorResult. So we just won't call those non-fully active listeners but a big question is whether other browsers have an implementation that doesn't match spec but does match the non-normative note. There may be WPT's that cover this (or should, if they don't) but I can't go further down this rabbit hole at this moment. I'm going to set needinfos on :smaug and :zcorpan who may have more understanding about the non-normative text in case they have some understanding readily at hand.
We can enhance our implementation to stop evicting on the first message, instead hooking up a GlobalFreezeObserver.h and (carefully) using our existing buffering mechanism, unless the non-normative text is now standard, in which case we'll need a secondary buffering. Especially given the propensity of people to file eviltraps bugs about things that will obviously cause memory leaks, we should definitely have a heuristic to trigger bfcache eviction when the backlog reaches some level. A million messages feels high so I'd probably want to look exactly at the

Assignee: nobody → bugmail

Severity: -- → S3

Status: NEW → ASSIGNED

Flags: needinfo?(zcorpan)

Flags: needinfo?(smaug)

Priority: -- → P2

Simon Pieters [:zcorpan]

Comment 3

•

13 hours ago

The non-normative text was changed by domenic in https://github.com/whatwg/html/pull/5728 without anyone commenting about the change in the non-normative note. Tests for this change were https://github.com/web-platform-tests/wpt/pull/24566

I think the non-normative note is just trying to match what would fall out from implementing the normative requirements. I don't know if the note is actually correct.

If we don't bfcache the old doc when it has a transferred entangled port, it's not observable if messages are dropped or queued.

Flags: needinfo?(zcorpan)

Flags: needinfo?(smaug)

You need to log in before you can comment on or make changes to this bug.

Bugzilla

postMessage during visibiltychange with document.visibilityState == 'hidden' is dropped

Categories

(Core :: DOM: postMessage, defect, P2)

Tracking

()

People

(Reporter: jrmuizel, Assigned: asuth)

References

(Blocks 1 open bug)

Details

(Keywords: webcompat:platform-bug)

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Updated

Updated

Comment 2

Comment 3