Status

()

enhancement
P3
normal
2 years ago
a month ago

People

(Reporter: baku, Unassigned)

Tracking

(Blocks 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Reporter)

Description

2 years ago
+++ This bug was initially created as a clone of Bug #1333981 +++

We still need to cover this IPC messages:

PStorage::Msg_LoadItem
PContent::Msg_DispatchLocalStorageChange
PStorage::Msg_LoadDone
PStorage::Msg_LoadUsage
(Reporter)

Comment 1

2 years ago
The only concern I have is about PContent::Msg_DispatchLocalStorageChange. This can end up dispatching DOM events but here we have to use 1 single eventTarget: we cannot take the correct one from the windows.
Note that here the dispatching is synchronous.
Assignee: nobody → amarchesini
Attachment #8883902 - Flags: review?(wmccloskey)
Comment on attachment 8883902 [details] [diff] [review]
storage2.patch

Review of attachment 8883902 [details] [diff] [review]:
-----------------------------------------------------------------

I don't think this will work. For the DispatchLocalStorageChange message, would it be okay to make it dispatch the event asynchronously (that is, to set aImmediateDispatch to false)? That adds some latency to the event, but would it still be correct?

I also am worried that the other messages can't be SystemGroup. What would happen if you had code like this:

function eventHandler() {
  let value1 = localStorage.getItem('foo');
  let value2 = localStorage.getItem('bar');
  // assert that some invariant involving value1 and value2 is true
  // (maybe value2 == value1 + 1 or something)
}

I'm worried that we could preempt the event handler between setting value1 and value2 and then run a SystemGroup runnable that changes some value in localStorage. Is that possible with this patch? I'm afraid I don't know enough about how the code works to know.

Someone who understands this code better should review should review the next version.
Attachment #8883902 - Flags: review?(wmccloskey) → review-
:janv's moving PStorage to PBackground on bug 1350637, so I'm not sure we need to worry about this.  He has a try push up already that's basically there, and I'm set to review and will be able to turn it around pretty quickly.
Depends on: 1350637
Er, and he's going to move the PContent DispatchLocalStorageChange message too.
What thread does the PBackground stuff run on in the content process? Not the main thread? Presumably we have to dispatch the notifications to the main thread at some point. When that happens, will there be a better way to label them with Jan's patch?

Comment 6

2 years ago
The threading model for local storage is simple. The child side of PBackground communication is always the main thread (either main process or content process).
The parent side is the dedicated PBackground thread. There's also an I/O thread dedicated to Local Storage, this thread is created and shutdown on the PBackground thread.

There are some dispatches to the main thread here and there which do stuff that is main thread only, like registering observers.

Either way, I think this bug should be done after the move to PBackground.
Sorry, only meant to convey the patch was moot, not the underlying labeling issue which still stands.  I'll be looking at them as part of my review in bug 1350637.

Correctness concerns are relevant, but as I stated in https://bugzilla.mozilla.org/show_bug.cgi?id=1379568#c6, the previous labeling set of patches in bug 1333981 already regressed localStorage correctness.  It's my hope that :janv's changes make it easier to clean all of this up, but I need to read and understand them more thoroughly than my initial files-changed skim.  Ideally I can propose changes and/or provide a patch to layer on top of :janv's stack to make this happen now-ish.

In the best case, we'll end up with:
- Broadcast event is received, we can't associate it with a specific TabGroup at this time[1], which I guess makes it SystemGroup?  But that's okay because it just looks like propagation latency.  Loads also need to occur in this same group, whatever it is, in order to synchronize the load and broadcasts as perceived by the documents.
- The broadcast event is processed, which entails both:
  - The LocalStorageCache's store is updated at this time.  No more having CloneEvent mutate as a side effect.  This is where things are going wrong right now on trunk after bug 1333981.
  - The broadcast event is then multiplexed out to all of the specific documents, scheduled in their specific TabGroups.  This does mean that a page can observe a change by consulting its LocalStorage instance prior to receiving the event.  This *is* a change from our pre-e10s behavior when localStorage changes were dispatched synchronously, but is spec-compliant with https://html.spec.whatwg.org/multipage/webstorage.html#send-a-storage-notification which calls for us to "queue a task".


1: In actuality, we do and/or can know the origin that the LocalStorage changes are concerned with for all of the messages in question.  If we had a helper that did the following, I guess we could improve on things:
- Given an "actual" runnable and an origin, locate all TabGroups containing the given origin at the current time.
- Queue a "speculative" run-at-most-once runnable across every TabGroup that contains the origin.
- When the speculative runnable runs, have it Run() the actual runnable if it has not already run.  If it has run, become a no-op and return early.
This provides best-effort scheduling guarantees for the active TabGroup, even in the face of the active TabGroup changing before any of our speculative runnables have run.  It's also a lot more work, so it may be better to just accept the latency in event propagation between processes, given that localStorage is ugly and the sites that use it frequently use it in ugly ways.  The main problem we run into is people use it like it's BroadcastChannel() but that's not a universally cross-browser thing.
Bug 1350637 will not end up addressing the labeling or provide an opportunity to clean up the bug 1333981 correctness regression.  If there's an easy way to write a test that can leverage a TabGroup being in the background with a de-prioritized or suspended queue, that'd be a great basis for me to provide a test that covers that regression.
(In reply to Andrew Sutherland [:asuth] from comment #8)
> Bug 1350637 will not end up addressing the labeling or provide an
> opportunity to clean up the bug 1333981 correctness regression.  If there's
> an easy way to write a test that can leverage a TabGroup being in the
> background with a de-prioritized or suspended queue, that'd be a great basis
> for me to provide a test that covers that regression.

Unfortunately there isn't any way to do that right now. Runnables still run in FIFO order. It's going to be another two weeks or so before anything like that is possible. In the mean time, the only way to get a runnable to run in between the StorageNotifierService::Broadcast runnables is to have some other thread post it while StorageNotifierService::Broadcast is running. That seems pretty hard to do reproducibly.

I still don't really understand how we're going to fix this. If the runnable that actually changes the stored value is SystemGroup, then that means that it can run while a page's JS code is suspended. So you could have a page that reads a localStorage value twice and gets two different values. Is that allowed via spec? How likely is it to break web pages?

Even if we made the change in a runnable labeled with some TabGroup, we still would have the same problem. Some other tab that's affected could observe the change while it's running.
(In reply to Bill McCloskey (:billm) from comment #9)
> I still don't really understand how we're going to fix this. If the runnable
> that actually changes the stored value is SystemGroup, then that means that
> it can run while a page's JS code is suspended. So you could have a page
> that reads a localStorage value twice and gets two different values. Is that
> allowed via spec? How likely is it to break web pages?

At https://html.spec.whatwg.org/multipage/webstorage.html#the-localstorage-attribute there's this nice big red warning text that lets us do anything we want:
"""
The localStorage attribute provides access to shared state. This specification does not define the interaction with other browsing contexts in a multiprocess user agent, and authors are encouraged to assume that there is no locking mechanism. A site could, for instance, try to read the value of a key, increment its value, then write it back out, using the new value as a unique identifier for the session; if the site does this twice in two different browser windows at the same time, it might end up using the same "unique" identifier for both sessions, with potentially disastrous effects.
"""

Having said that, if feasible, I think we should aim for maintaining run-to-completion and microtask and task-queue semantics.  The values returned by a LocalStorage read would only change after yielding control flow such that another event/task could run.  Very specifically, the following should always hold true.
  const a = localStorage['foo'];
  // literally anything that doesn't trigger a nested event loop, so no sync XHR/modal dialogs.
  assert(a === localStorage['foo']);
  Promise.resolve().then(() => { assert(a === localStorage['foo']); });

localStorage reads would be allowed to change between HTML spec "tasks" (but after the microtask checkpoint for a task), with the guarantee that you'll eventually get the defined "storage" event that tells you what the "old" and "new" values were *from the perspective of the code that performed the mutation*, but these may be well behind the current values that will be returned by a localStorage read.  If we have one process evolve the value from "a" -> "b" -> "c" and another process go "a" -> "x" -> "y", and the apparent sequence of changes to the value is ["a", "b", "x", "c", "y"], the mutation events will currently still be ("b" -> "c") and ("x" -> "y").  (Or at least as currently implemented.  If we weren't working to discourage people from using localStorage in general, we could pursue a more thorough standardization including specifying the events get fixed up to be consistent with what localStorage reads will return if issued in succession after each change.)

If we're still planning to do the magic quasi-"green"-threading multi-tasking in Quantum where we suspend JS execution before the task has fully concluded and switch native stacks, or otherwise run multiple JS tasks for the same origin in parallel in the same process, then we will break these arbitrary semantics without some work.  While we could hobble our performance by not doing those optimizations, the easiest thing would be to just ensure that LocalStorage reads latch their values and return that same value until the "task" and microtask checkpoints complete.  If we wanted to provide even greater (not required) consistency guarantees, the first read could latch the entire state of the Map.

Or we could just let the values change at any arbitrary point.  There might be some breakage, but it simplifies the implementation.
(In reply to Andrew Sutherland [:asuth] from comment #10)
> If we're still planning to do the magic quasi-"green"-threading
> multi-tasking in Quantum where we suspend JS execution before the task has
> fully concluded and switch native stacks, or otherwise run multiple JS tasks
> for the same origin in parallel in the same process, then we will break
> these arbitrary semantics without some work.

Yes, the purpose of this bug is to make green threading possible.

>  While we could hobble our
> performance by not doing those optimizations, the easiest thing would be to
> just ensure that LocalStorage reads latch their values and return that same
> value until the "task" and microtask checkpoints complete.  If we wanted to
> provide even greater (not required) consistency guarantees, the first read
> could latch the entire state of the Map.

OK, I guess that would work, and it would also fix the issue where the value would change before we fire the change event for it. Perhaps we could do some kind of copy-on-write thing to avoid copying in the common case.
This is an assigned P1 bug without activity in two weeks. 

If you intend to continue working on this bug for the current release/iteration/sprint, remove the 'stale-bug' keyword.

Otherwise we'll reset the priority of the bug back to '--' on Monday, August 28th.
Keywords: stale-bug
(Per :overholt's request, I'm taking this bug because this is one of the last big unlabeled pieces.)
Assignee: amarchesini → bugmail
Status: NEW → ASSIGNED
:overholt and I talked, we're triaging this into Fx58 based on the imminent end of the 57 cycle, the complexity of changes I had planned, and because cooperative multi-threading didn't make 57 so we're not facing any (new) correctness issues.  Also, those changes would have been largely mooted by :janv's DOMStorage/localStorage overhaul on bug 1286798 which could make Fx58, so just moving to depend on bug 1286798.

There are however, a number of localStorage intermittents and one so-frequent-that-it-got-disabled test (bug 1347690) hanging out that I'm going to address now so that overhaul is built on solid ground.
Assignee: bugmail → nobody
Status: ASSIGNED → NEW
Depends on: 1286798
Priority: P1 → P2
Moving to p3 because no activity for at least 1 year(s).
See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information
Priority: P2 → P3
Component: DOM → DOM: Core & HTML
Product: Core → Core
You need to log in before you can comment on or make changes to this bug.