Closed Bug 1350637 Opened 7 years ago Closed 7 years ago

Move LocalStorage from PContent-managed PStorage to PBackground-managed PBackgroundStorage (was: PStorage::Msg_Preload sync IPC takes too long)

Categories

(Core :: DOM: Core & HTML, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla57
Performance Impact high
Tracking Status
firefox56 --- fixed
firefox57 --- fixed

People

(Reporter: ehsan.akhgari, Assigned: janv)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

Attachments

(11 files, 4 obsolete files)

2.85 KB, patch
billm
: review+
Details | Diff | Splinter Review
2.00 KB, patch
asuth
: review+
Details | Diff | Splinter Review
18.26 KB, patch
asuth
: review+
Details | Diff | Splinter Review
4.80 KB, patch
asuth
: review+
Details | Diff | Splinter Review
3.55 KB, patch
billm
: review+
Details | Diff | Splinter Review
19.94 KB, patch
asuth
: review+
Details | Diff | Splinter Review
27.44 KB, patch
asuth
: review+
billm
: review+
Details | Diff | Splinter Review
50.35 KB, patch
asuth
: review+
Details | Diff | Splinter Review
19.62 KB, patch
asuth
: review+
Details | Diff | Splinter Review
17.79 KB, patch
janv
: review+
Details | Diff | Splinter Review
7.02 KB, patch
asuth
: review+
Details | Diff | Splinter Review
According to the LOCALDOMSTORAGE_PRELOAD_PENDING_ON_FIRST_ACCESS telemetry on release, 25% of localStorage accesses block and lead to this sync IPC currently.  On nightly, the median time is 19.3 which is quite terrible.

It seems that the majority of these are from the getvalue callsite, and we have more detailed telemetry on the release channel on that: <https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2017-03-16&keys=__none__!__none__!__none__&max_channel_version=release%252F52&measure=LOCALDOMSTORAGE_GETVALUE_BLOCKING_MS&min_channel_version=null&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2017-03-02&table=0&trim=1&use_submission_date=0>

Things aren't looking very good, given how commonly used this API is. :-(

Honza, what are our options here in order to improve this situation?
Flags: needinfo?(honzab.moz)
Isn't this about localStorage which is horrible synchronous API, and the preloading just tries to overcome that horrendousness in common cases, without guarantees that it is helping in all the cases.
Sure, but it doesn't mean that we should just accept the current state and not try to improve anything.  I'm not familiar enough to know what we can improve, but there is certainly room for improvement.

For example, right now we kick off the preload from here: <http://searchfox.org/mozilla-central/rev/7419b368156a6efa24777b21b0e5706be89a9c2f/dom/base/nsGlobalWindow.cpp#3294>.  With e10s, this is extremely inefficient, we have knows *way* before in the parent process that we are loading a document, why wait all this time to download the HTML from the web server and parse it and create a document from it etc before kicking off the preload?

Also, can we do something more drastic?  For example, instead of IPCing to the main thread which is busy doing all sorts of stuff, can we do this work on PBackground?

I'm hoping Honza can fill in with some better ideas...
Well, localStorage is sync, so you must block main thread when accessing it from JS.
Not sure what moving anything to PBackground here would help.

But initiating preload could indeed happen in parent process. If we're lucky, it is ready before child
needs the data.
(In reply to Olli Pettay [:smaug] from comment #3)
> Well, localStorage is sync, so you must block main thread when accessing it
> from JS.

Definitely.

> Not sure what moving anything to PBackground here would help.

It can help if the reason behind the long latency is the target of the IPC being busy processing other messages, which the main thread can be...  For example, typically the sync IPCs that we do during tab opening take longer than those that happen at times the parent is guaranteed to be idle.
Please from now on turn with questions regarding DOMStorage to Jan Varga, I no longer maintain this code.


To answer: 

if you know about an earlier spot to start the preload, do so.  It could mitigate the problem a lot.  I'd try and see.

If you don't like the sync message, let you know it's optional.  The load is probably already happening asynchronously.  The sync message was only a "kick" to try to wait less time for the data coming from the parent.  If the sync message doesn't work well in e10s, it could be turned off, the data will get to the child anyway via the async preload.

There is a bug for PBackgrounding this... not having the number in hand.  There is not much of a progress on it anyway.

The localStorage API is a bad design.  It's sync.  So you have to block.
Flags: needinfo?(honzab.moz)
Jan and Ehsan are scheduled to discuss options here in < 48 h so setting needinfo for any outcomes.
Flags: needinfo?(jvarga)
Priority: -- → P2
Whiteboard: [qf]
Whiteboard: [qf] → [qf:p1]
(In reply to Andrew Overholt [:overholt] from comment #6)
> Jan and Ehsan are scheduled to discuss options here in < 48 h so setting
> needinfo for any outcomes.

I started working on moving PStorage from PContent to PBackground.
It should help with reducing main thread blocking in the parent process.
Flags: needinfo?(jvarga)
Will we also move the localStorage preload start to a sooner point?  If it can be done, it's probably pretty easy to do with a huge benefit.
(In reply to Honza Bambas (:mayhemer) from comment #8)
> Will we also move the localStorage preload start to a sooner point?  If it
> can be done, it's probably pretty easy to do with a huge benefit.

So currently it's triggered in nsGlobalWindow::SetNewDocument(), I guess you know a better place, where would you do it instead ?
Keep in mind that because the localStorage e10s changes in bug 1285898 use nsGlobalWindow as the change propagation mechanism, you *must* have an nsGlobalWindow around to process updates as soon as you kick off the pre-load as things are currently implemented.  This limit how early you can move the pre-load without non-trivial enhancements.  Anything else will lead to desynchronization of localStorage state.
(In reply to Jan Varga [:janv] from comment #9)
> (In reply to Honza Bambas (:mayhemer) from comment #8)
> > Will we also move the localStorage preload start to a sooner point?  If it
> > can be done, it's probably pretty easy to do with a huge benefit.
> 
> So currently it's triggered in nsGlobalWindow::SetNewDocument(), I guess you
> know a better place, where would you do it instead ?

No, I don't.  See https://bugzilla.mozilla.org/show_bug.cgi?id=1350637#c2, so please ask Ehsan.
(In reply to Andrew Sutherland [:asuth] from comment #10)
> Keep in mind that because the localStorage e10s changes in bug 1285898 use
> nsGlobalWindow as the change propagation mechanism, you *must* have an
> nsGlobalWindow around to process updates as soon as you kick off the
> pre-load as things are currently implemented.  This limit how early you can
> move the pre-load without non-trivial enhancements.  Anything else will lead
> to desynchronization of localStorage state.

Not sure I follow.  The preload blocks (until done) any modifications by content on the localStorage object.  The preload is a very much background thing.
(In reply to Honza Bambas (:mayhemer) from comment #11)
> No, I don't.  See https://bugzilla.mozilla.org/show_bug.cgi?id=1350637#c2,
> so please ask Ehsan.

Ah, ok. So currently I don't have any plans to change the trigger point. We want to move PStorage from PContent to PBackground first. If that doesn't proof to be sufficient performance-wise then we can consider to change the trigger point.
(In reply to Honza Bambas (:mayhemer) from comment #12)
> Not sure I follow.  The preload blocks (until done) any modifications by
> content on the localStorage object.  The preload is a very much background
> thing.

In a single process, yes.  But e10s localStorage change propagation occurs via (naive) broadcast that's wholly separated from the database bridge.  If another process does something and there's not an nsGlobalWindow to receive and process the change (which depends on the synchronous preload then triggering), then that change will be lost and never perceived by that content process.

Again, there are (hacky) enhancements possible, but localStorage is really due for a (non-hacky) overhaul to clean up the e10s case.
(In reply to Jan Varga [:janv] from comment #13)
> (In reply to Honza Bambas (:mayhemer) from comment #11)
> > No, I don't.  See https://bugzilla.mozilla.org/show_bug.cgi?id=1350637#c2,
> > so please ask Ehsan.
> 
> Ah, ok. So currently I don't have any plans to change the trigger point. We
> want to move PStorage from PContent to PBackground first. If that doesn't
> proof to be sufficient performance-wise then we can consider to change the
> trigger point.

I think you should change the trigger point _as well_ if easy.
(In reply to Honza Bambas (:mayhemer) from comment #11)
> (In reply to Jan Varga [:janv] from comment #9)
> > (In reply to Honza Bambas (:mayhemer) from comment #8)
> > > Will we also move the localStorage preload start to a sooner point?  If it
> > > can be done, it's probably pretty easy to do with a huge benefit.
> > 
> > So currently it's triggered in nsGlobalWindow::SetNewDocument(), I guess you
> > know a better place, where would you do it instead ?
> 
> No, I don't.  See https://bugzilla.mozilla.org/show_bug.cgi?id=1350637#c2,
> so please ask Ehsan.

When I fix bug 1360723, you can add this code to ContentParent::AboutToLoadDocumentForChild().  That function will be called for all loads of documents from HTTP(S), FTP and wyciwyg channels.

Would that work for your needs here?
Depends on: 1360723
(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #16)
> When I fix bug 1360723, you can add this code to
> ContentParent::AboutToLoadDocumentForChild().  That function will be called
> for all loads of documents from HTTP(S), FTP and wyciwyg channels.
> 
> Would that work for your needs here?

Yeah, but I still think we should do it separately. PContent -> PBackground first and then change the trigger point.
(In reply to Jan Varga [:janv] from comment #17)
> (In reply to :Ehsan Akhgari (super long backlog, slow to respond) from
> comment #16)
> > When I fix bug 1360723, you can add this code to
> > ContentParent::AboutToLoadDocumentForChild().  That function will be called
> > for all loads of documents from HTTP(S), FTP and wyciwyg channels.
> > 
> > Would that work for your needs here?
> 
> Yeah, but I still think we should do it separately. PContent -> PBackground
> first and then change the trigger point.

I agree that we should do them separately (they're really independent from each other.)

But why not do both together?  I think moving the trigger point is really easy, it's just a matter of calling the parent side of the preload IPC message (https://searchfox.org/mozilla-central/rev/38fdbbcaa4be3a3f5b0d207dc5d53fb687c42d6a/dom/storage/StorageIPC.cpp#406) in AboutToLoadDocumentForChild(), isn't it?
(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #18)
> (In reply to Jan Varga [:janv] from comment #17)
> > (In reply to :Ehsan Akhgari (super long backlog, slow to respond) from
> > comment #16)
> > > When I fix bug 1360723, you can add this code to
> > > ContentParent::AboutToLoadDocumentForChild().  That function will be called
> > > for all loads of documents from HTTP(S), FTP and wyciwyg channels.
> > > 
> > > Would that work for your needs here?
> > 
> > Yeah, but I still think we should do it separately. PContent -> PBackground
> > first and then change the trigger point.
> 
> I agree that we should do them separately (they're really independent from
> each other.)
> 
> But why not do both together?  I think moving the trigger point is really
> easy, it's just a matter of calling the parent side of the preload IPC
> message
> (https://searchfox.org/mozilla-central/rev/
> 38fdbbcaa4be3a3f5b0d207dc5d53fb687c42d6a/dom/storage/StorageIPC.cpp#406) in
> AboutToLoadDocumentForChild(), isn't it?

Do note that we still need to call it from the previous location. That function is only called for some document loads, and isn't called for, for example, loads which are intercepted by service workers.
Yes, and note that doing the preload twice (or more times) is OK to do.  Following preloads are just no-ops.
What do you think, Jan?
Flags: needinfo?(jvarga)
(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #21)
> What do you think, Jan?

Ok, I can try to do it together, but given asuth's concern it can take more time.
Flags: needinfo?(jvarga)
(In reply to Jan Varga [:janv] from comment #22)
> (In reply to :Ehsan Akhgari (super long backlog, slow to respond) from
> comment #21)
> > What do you think, Jan?
> 
> Ok, I can try to do it together, but given asuth's concern it can take more
> time.

Thanks, and fair enough.  :-)
(Jan's on this so I'm assigning it to him. He's currently doing a few other things but has started work here and will get back to it soon. Also, as people have stated elsewhere, this is probably the biggest localStorage-related win we can get in the short-term.)
Assignee: nobody → jvarga
This is almost ready for review. I'm currently testing it on try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=86fc9a6c2af2c24b733039c610ddbbadc5c82803
(In reply to Jan Varga [:janv] from comment #25)
> This is almost ready for review. I'm currently testing it on try:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=86fc9a6c2af2c24b733039c610ddbbadc5c82803

I would suggest moving PContent::BroadcastLocalStorageChange over to PBackground too for consistency, correctness, and sanity.  The separate channels for PBackground and PContent means we are unable to reason about when messages will be delivered between the two (without performing synchronizing ping-pongs).  If we leave it on PContent, I believe it's possible for a process to see change events that are already reflected in the state retrieved from the DB.  It's also possible for it to miss a series of change events.  At least I think... I haven't done the full legwork... this stuff gets tricky to reason about, which is why just having it all happen via PBackground seems like the best course of action.  Especially since it's equivalent to the ordering constraints we have before moving the logic to PBackground.)
Depends on: 1384618
Here's current try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=53ca08f1b874802338147203c3656a370cadefdf

There's only one failure related to my changes:
https://treeherder.mozilla.org/logviewer.html#?job_id=118035726&repo=try&lineNumber=45182

I updated my windows build environment to debug it.

I'm going to do some extra small cleanup and after that I'll submit patches for review.
I should be able to submit it tonight.

I'll try to fix the orange on try and also move the broadcasting after that.
Attachment #8890492 - Flags: review?(bugmail)
Attachment #8890723 - Flags: review?(bugmail)
Ok, that's it for now, you can start reviewing attached patches. try is "almost" green, there's still one orange:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4161cb2e59d107db78b337a131fcc2bb9b38f0a6
Attachment #8890744 - Attachment is obsolete: true
Attachment #8890744 - Flags: review?(bugmail)
Attachment #8890760 - Flags: review?(bugmail)
Andrew, I have a question regarding storage event broadcasting/dispatching.
These methods are now defined on PContent as opposed to PStorage.
I guess the storage event should be fired in other processes even when there's no PStorage protocol open.
PContentChild is guaranteed to exist for all child processes.

Now if I move these broadcasting/dispatching methods to PBackground, there's no guarantee that PBackgroundChild exists for all content processes, so the storage event wouldn't be fired for them, right ?
Flags: needinfo?(bugmail)
(In reply to Jan Varga [:janv] from comment #37)
> Andrew, I have a question regarding storage event broadcasting/dispatching.
> These methods are now defined on PContent as opposed to PStorage.
> I guess the storage event should be fired in other processes even when
> there's no PStorage protocol open.
> PContentChild is guaranteed to exist for all child processes.
> 
> Now if I move these broadcasting/dispatching methods to PBackground, there's
> no guarantee that PBackgroundChild exists for all content processes, so the
> storage event wouldn't be fired for them, right ?

An excellent concern to have!  But there's good news!  For correctness reasons, during the e10s-ificiation of localstorage, we made it so adding a "storage" event listener causes the LocalStorage instance to be created for the given window:
https://searchfox.org/mozilla-central/source/dom/base/nsGlobalWindow.cpp#13703

It's true that a content process that did not already have a PBackgroundChild instantiated may miss storage events it might have received had the child already existed, but I believe this to be acceptable in a multi-process world with the inherent coordination latencies.  In particular, we'd expect similar latencies if we ever start filtering these broadcast events.  (Or if the storage events stop being their own thing and instead are associated with LocalStorageCache instances.)
Flags: needinfo?(bugmail)
Great, thanks for the info.
Comment on attachment 8890735 [details] [diff] [review]
Part 3: Move mozilla::dom::Optional serialization helper to ipc/glue/IPCMessageUtils.h to make it available to other consumers

Review of attachment 8890735 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/glue/IPCMessageUtils.h
@@ +996,5 @@
> +  }
> +
> +  static bool Read(const Message* aMsg, PickleIterator* aIter, paramType* aResult)
> +  {
> +    bool was_passed = false;

wasPassed

@@ +1005,5 @@
> +
> +    aResult->Reset(); //XXX Optional_base seems to reach this point with isSome true.
> +
> +    if (was_passed) {
> +      if (!ReadParam(aMsg, aIter, &(aResult->Construct()))) {

No need for the parens.
Attachment #8890735 - Flags: review?(wmccloskey) → review+
I rebased the patches on top of bug 1283609 which is about converting |bool BackgroundChild::GetOrCreateForCurrentThread(callback)| to |PBackgroundChild* GetOrCreateForCurrentThread()|
I also moved local storage event broadcasting from PContent to PBackground, but there's a problem with a new method BackgroundParent::GetLiveActors(). I need to fix that, it's the last thing, besides addresing review comments.
Thanks, I was indeed waiting on the rebases to happen and stabilize before finishing out my review since this is the type of patch that's good to run with locally!  (Although I see now that I didn't hit submit on that comment, wherever that tab is, apologies.)  I believe I see the relevant try pushes under https://treeherder.mozilla.org/#/jobs?repo=try&author=jvarga@mozilla.com.
Attachment #8890492 - Attachment is obsolete: true
Attachment #8890492 - Flags: review?(bugmail)
Attachment #8892305 - Flags: review?(bugmail)
Attachment #8890723 - Attachment is obsolete: true
Attachment #8890723 - Flags: review?(bugmail)
Attachment #8892306 - Flags: review?(bugmail)
Depends on: 1283609
No longer depends on: 1384618
Comment on attachment 8892254 [details] [diff] [review]
Part 8: Implement BackgroundParent::GetLiveActorArray

Review of attachment 8892254 [details] [diff] [review]:
-----------------------------------------------------------------

This seems fine, although the whole mLiveActorArray thing is weird. Why do all ParentImpl actors have a field mLiveActorArray that seems to be equal to the static field ParentImpl::sLiveActorsForBackgroundThread? Well, whatever.

::: ipc/glue/BackgroundImpl.cpp
@@ +1140,5 @@
>  
>  // static
>  bool
> +ParentImpl::GetLiveActorArray(PBackgroundParent* aBackgroundActor,
> +                              nsTArray<PBackgroundParent*>& aLiveActorArray)

It might make sense to clear aLiveActorArray at the top of this function.
Attachment #8892254 - Flags: review?(wmccloskey) → review+
Comment on attachment 8890735 [details] [diff] [review]
Part 3: Move mozilla::dom::Optional serialization helper to ipc/glue/IPCMessageUtils.h to make it available to other consumers

Review of attachment 8890735 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/glue/IPCMessageUtils.h
@@ +1002,5 @@
> +    if (!ReadParam(aMsg, aIter, &was_passed)) {
> +      return false;
> +    }
> +
> +    aResult->Reset(); //XXX Optional_base seems to reach this point with isSome true.

If we're already touching existing but moved code, maybe this XXX comment should be discarded while we're at it?

Specifically, the comment seems to be assuming the aResult was only ever default constructed and that the default constructor would never invoke Optional::Construct().  ParamTraits' contract seems under-specified in this regard.  Certainly if the target's Optional::Construct() was ever invoked, it absolutely is the correct case to call Reset() here so that the state conforms to what is being deserialized if !wasPassed.
Comment on attachment 8892307 [details] [diff] [review]
Part 6: Fix a deadlock when main process storage child actor triggers storage thread initialization

Review of attachment 8892307 [details] [diff] [review]:
-----------------------------------------------------------------

This cannot have been a fun patch to write!

::: dom/storage/StorageIPC.h
@@ +269,5 @@
>    CacheParentBridge* NewCache(const nsACString& aOriginSuffix,
>                                const nsACString& aOriginNoSuffix);
>  
>    RefPtr<ObserverSink> mObserverSink;
> +  nsString mProfilePath;

I'd suggest adding a comment along these lines:

A hack to deal with deadlock between the parent process main thread and background thread when invoking StorageDBThread::GetOrCreate because it cannot safely perform a synchronous dispatch back to the main thread (because we are already synchronously doing things on the stack).  Populated in the parent process, empty in content processes.
Attachment #8892307 - Flags: review?(bugmail) → review+
Attachment #8892305 - Flags: review?(bugmail) → review+
Comment on attachment 8892306 [details] [diff] [review]
Part 2: Core changes for LocalStorage on PBackground

Review of attachment 8892306 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/storage/StorageDBThread.cpp
@@ +138,5 @@
> +    , mMainThreadResultCode(NS_OK)
> +    , mWaiting(true)
> +  { }
> +
> +  nsresult

nit: I'd suggest prefixing the method with "Sync" and adding a comment like:

Because of the `sync Preload` IPC, we need to be able to synchronously initialize, which includes consulting and initializing some main-thread-only APIs.  Bug 1386441 discusses improving this situation.

::: dom/storage/StorageDBThread.h
@@ +31,5 @@
>  class StorageUsage;
>  
>  typedef mozilla::storage::StatementCache<mozIStorageStatement> StatementCache;
>  
> +// XXX Fix me!

These items don't appear to be addressed in any of the patches in the stack[1]; please file a spin-off bug to do this cleanup and put its bug reference in here.

1: For example, StorageDBChild::{AsyncClear*, AsyncFlush} that you added MOZ_CRASH invocations to have no reason to exist anymore, etc.

::: dom/storage/StorageIPC.cpp
@@ +822,5 @@
>        mozilla::Unused << mParent->SendLoadDone(mSuffix, mOrigin, mRv);
>        break;
>      }
>  
> +    mParent = nullptr;

good catch/fix on nulling this out on the actor's owning thread rather than having to deal with the race where the refcount hits zero on the I/O thread!  (And/or kudos on adding the ObserverSink::Stop AssertIsOnBackgroundThread() assertion in part 5 that maybe caught this? ;)

@@ +884,5 @@
>    MOZ_ASSERT(false);
>  }
>  
> +NS_IMETHODIMP_(void)
> +StorageDBParent::CacheParentBridge::Release(void)

Is there a reason this can't just be:
  NS_IMPL_RELEASE_WITH_DESTROY(StorageDBParent::CacheParentBridge, Destroy)
like you do in QuotaManager and indexedDB?  It would be NS_IMETHODIMP_(MozExternalRefCountType) instead of void, requiring corresponding changes in LocalStorageCacheBridge and LocalStorageCache, but those seem like odd legacy decisions.  Failing that, a comment would be useful here that documents what needs to be added to nsISupportsImpl.h to eliminate this modified boilerplate.

(https://searchfox.org/mozilla-central/source/xpcom/base/nsISupportsImpl.h#635 for docs/impl.)

@@ +960,5 @@
> +  MOZ_ALWAYS_SUCCEEDS(mOwningEventTarget->Dispatch(r, NS_DISPATCH_NORMAL));
> +}
> +
> +NS_IMETHODIMP_(MozExternalRefCountType)
> +StorageDBParent::UsageParentBridge::Release(void)

Similar to the above, why not NS_IMPL_RELEASE_WITH_DESTROY here?  The return type is already "correct".  And confusingly, this variant is lacking the `mRefCnt = 1; /* stabilize */` line but is otherwise identical usage?

::: dom/storage/StorageIPC.h
@@ +187,5 @@
>    private:
> +    void
> +    Destroy();
> +
> +    nsCOMPtr<nsIEventTarget> mOwningEventTarget;

nit: MozPromise wants the newly introduced nsISerialEventTarget (which differs from nsIEventTarget in that it guarantees serial execution, which means thread pools can't be nsISerialEventTargets until you stick a TaskQueue on them or whatever), so ideally we would be using that and GetCurrentThreadSerialEventTarget() instead of nsIEventTarget/GetCurrentThreadEventTarget() except in cases where we really have/want thread pool parallel execution semantics.
Attachment #8892306 - Flags: review?(bugmail) → review+
Attachment #8890736 - Flags: review?(bugmail) → review+
Comment on attachment 8890739 [details] [diff] [review]
Part 5: Factor out the parent actor observer sink to work on the PBackground thread, fix the rest of observer handling to use IPC

Review of attachment 8890739 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/storage/PBackgroundStorage.ipdl
@@ +40,5 @@
>                          nsString key);
>    async AsyncClear(nsCString originSuffix, nsCString originNoSuffix);
>    async AsyncFlush();
> +
> +  async Startup();

It might be appropriate to add a comment above these like:

These are privileged operations for use only by the observer API for delayed initialization and clearing origins and will never be used from content process children.  Ideally these would be guarded by checks or exist on a separate, privileged interface, but PBackgroundStorage is already insecure.

::: dom/storage/StorageIPC.cpp
@@ +413,5 @@
> +{
> +  nsCOMPtr<nsIEventTarget> mOwningEventTarget;
> +
> +  // Only touched on the PBackground thread.
> +  StorageDBParent* mActor;

I think this wants to be labeled with MOZ_NON_OWNING_REF (https://searchfox.org/mozilla-central/source/mfbt/Attributes.h#487) given that StorageDBParent is refcounted.
Attachment #8890739 - Flags: review?(bugmail) → review+
Attachment #8890815 - Flags: review?(bugmail) → review+
Comment on attachment 8892255 [details] [diff] [review]
Part 9: Move Local Storage event broadcasting from PContent to PBackground

Review of attachment 8892255 [details] [diff] [review]:
-----------------------------------------------------------------

It's really lovely how PBackground normalizes the parent process case, getting rid of many of the special cases.  (Which is extra nice given that localStorage had an edge case where changes in the content process would not be applied/heard in parent process pages, but changes in parent process pages would be applied/heard.)
Attachment #8892255 - Flags: review?(bugmail) → review+
Thanks for the comments, I'll address them ASAP.
However, I still need a review for some patches in bug 1283609.
So this will miss the uplift by a day or two.
We'll need to request beta approval.
(In reply to Bill McCloskey (:billm) from comment #49)
> ::: ipc/glue/BackgroundImpl.cpp
> @@ +1140,5 @@
> >  
> >  // static
> >  bool
> > +ParentImpl::GetLiveActorArray(PBackgroundParent* aBackgroundActor,
> > +                              nsTArray<PBackgroundParent*>& aLiveActorArray)
> 
> It might make sense to clear aLiveActorArray at the top of this function.

I added:
MOZ_ASSERT(aLiveActorArray.IsEmpty());
(In reply to Andrew Sutherland [:asuth] from comment #50)
> @@ +1002,5 @@
> > +    if (!ReadParam(aMsg, aIter, &was_passed)) {
> > +      return false;
> > +    }
> > +
> > +    aResult->Reset(); //XXX Optional_base seems to reach this point with isSome true.
> 
> If we're already touching existing but moved code, maybe this XXX comment
> should be discarded while we're at it?
> 
> Specifically, the comment seems to be assuming the aResult was only ever
> default constructed and that the default constructor would never invoke
> Optional::Construct().  ParamTraits' contract seems under-specified in this
> regard.  Certainly if the target's Optional::Construct() was ever invoked,
> it absolutely is the correct case to call Reset() here so that the state
> conforms to what is being deserialized if !wasPassed.

Ok, I removed the comment.
(In reply to Andrew Sutherland [:asuth] from comment #51)
> @@ +269,5 @@
> >    CacheParentBridge* NewCache(const nsACString& aOriginSuffix,
> >                                const nsACString& aOriginNoSuffix);
> >  
> >    RefPtr<ObserverSink> mObserverSink;
> > +  nsString mProfilePath;
> 
> I'd suggest adding a comment along these lines:
> 
> A hack to deal with deadlock between the parent process main thread and
> background thread when invoking StorageDBThread::GetOrCreate because it
> cannot safely perform a synchronous dispatch back to the main thread
> (because we are already synchronously doing things on the stack).  Populated
> in the parent process, empty in content processes.

I added this comment, but I modified the last sentence:
// Populated for the same process actors, empty for other process actors.
(In reply to Andrew Sutherland [:asuth] from comment #52)
> @@ +138,5 @@
> > +    , mMainThreadResultCode(NS_OK)
> > +    , mWaiting(true)
> > +  { }
> > +
> > +  nsresult
> 
> nit: I'd suggest prefixing the method with "Sync" and adding a comment like:

Added "Sync" prefix and also the comment.

> @@ +31,5 @@
> >  class StorageUsage;
> >  
> >  typedef mozilla::storage::StatementCache<mozIStorageStatement> StatementCache;
> >  
> > +// XXX Fix me!
> 
> These items don't appear to be addressed in any of the patches in the
> stack[1]; please file a spin-off bug to do this cleanup and put its bug
> reference in here.

Ok, filed bug 1387636 and added the reference.

> @@ +884,5 @@
> >    MOZ_ASSERT(false);
> >  }
> >  
> > +NS_IMETHODIMP_(void)
> > +StorageDBParent::CacheParentBridge::Release(void)
> 
> Is there a reason this can't just be:
>   NS_IMPL_RELEASE_WITH_DESTROY(StorageDBParent::CacheParentBridge, Destroy)
> like you do in QuotaManager and indexedDB?  It would be
> NS_IMETHODIMP_(MozExternalRefCountType) instead of void, requiring
> corresponding changes in LocalStorageCacheBridge and LocalStorageCache, but
> those seem like odd legacy decisions.  Failing that, a comment would be
> useful here that documents what needs to be added to nsISupportsImpl.h to
> eliminate this modified boilerplate.

Yeah, I did that in the first place, but the AddRef/Release logging was messed up:
LocalStorageCacheBridge implements AddRef/Release with "LocalStorageCacheBridge" string for logging
Now when I override just Release for CacheParentBridge, the string for logging would be
"StorageDBParent::CacheParentBridge". Not to mention that Release returns void in the base class.
It's a big mess and I don't want to deal with that in this bug, if you are ok with it.
I'll add a comment for that.

> ::: dom/storage/StorageIPC.h
> @@ +187,5 @@
> >    private:
> > +    void
> > +    Destroy();
> > +
> > +    nsCOMPtr<nsIEventTarget> mOwningEventTarget;
> 
> nit: MozPromise wants the newly introduced nsISerialEventTarget (which
> differs from nsIEventTarget in that it guarantees serial execution, which
> means thread pools can't be nsISerialEventTargets until you stick a
> TaskQueue on them or whatever), so ideally we would be using that and
> GetCurrentThreadSerialEventTarget() instead of
> nsIEventTarget/GetCurrentThreadEventTarget() except in cases where we really
> have/want thread pool parallel execution semantics.

Oh, good catch, thanks for pointing that out.
(In reply to Jan Varga [:janv] from comment #59)
> Yeah, I did that in the first place, but the AddRef/Release logging was
> messed up:
...
> It's a big mess and I don't want to deal with that in this bug, if you are
> ok with it.
> I'll add a comment for that.

Yeah, that's fine.  I'll sign off on any number of big messes as long as someone reading the code can figure out why the mess is there :)
I decided to test on try before landing:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=dab9e373bb636a736e71f65d3642c020df8e97f6&selectedJob=121143179

Looks good, but there's one test that initially failed (succeeded after a re-trigger):
https://treeherder.mozilla.org/logviewer.html#?job_id=121143179&repo=try

Andrew, you wrote this test, do you think this can be related to the PBackground move ?
Flags: needinfo?(bugmail)
(In reply to Jan Varga [:janv] from comment #61)
> Andrew, you wrote this test, do you think this can be related to the
> PBackground move ?

Seems like it.  `verifyTabStorageEvents` and `verifyTabStorageState` use the PContent-based Task.Spawn mechanism.  Ordering was previously guaranteed by everything being PContent, now there's a race with PBackground.  I think this I can address this by having page_lcoalStorage_e10s.js use the PBackground-based BroadcastChannel mechanism so that its returnAndClearStorageEvents() and getStorageState() methods are made async, blocking on having received a BroadcastChannel message from a call to mutateStorage().  Alternately, ContentTask.spawn could be made to support a PBackground-based routing.

I'll attempt a fix now using BroadcastChannel, push it to try on top of your series, and ask you to review and hopefully land with your stack.  Leaving my NI until the patch is up.

Relatedly, I was a little hasty in my analysis at https://bugzilla.mozilla.org/show_bug.cgi?id=1379568#c6 of the initial labeling patch regressing things.  As long as aImmediateDispatch=true, there is no regression.  (Our behavior is already unsafe, however, if the "storage" handler attempts to mutate the just-mutated key's value and there are multiple windows.  That said, any site that's doing that is at risk of causing an infinite ping-pong fight.)
Changed my mind about BroadcastChannel.  That's going to be too brittle when we label localStorage.  I'm just going to use an in-band sentinel storage event.  The advantage of BroadcastChannel was that we could fail quickly if we break storage events instead of hanging.
Ok, thank you!
This passed for me locally doing "--repeat 50" while just running the test.
I was able to reproduce the failure the same way, although how many times
I reproduced it was not clear; the output of the command is weird and the
leak detector was more than a little spammy about CookieServiceChild which
I believe is unlikely related to this patch (or your stack), and instead
more likely to have something to do with the recent cookie sync changes.

In the end, I ended up needing to also add a polling-check for the changes
to propagate since the tests explicitly attempt not to use "storage" event
listeners in at least one case.  The result is the least hacky thing I
could come up with.

I can provide any major changes desired, but if you have any minor fixes,
do feel free to make them and land, I know we all want this in sooner
rather than later.


Commit comment:

The e10s tests were written assuming a world where ContentTask.spawn and
LocalStorage were both PContent.  Also, that Quantum DOM labeling wasn't
something to worry about.  These assumptions no longer held, resulting in
the test intermittently failing if all changes hadn't propagated via
PBackground to the tab under test by the time the ContentTask.spawn
state retrieval call made it to the tab's main thread.

This has been corrected by using "storage" events where already in use and
polling where not in use.  Plase see the added comments for more details.
Attachment #8894437 - Flags: review?(jvarga)
Assignee: jvarga → bugmail
Status: NEW → ASSIGNED
(Whoops, I'm not sure why "hg bzexport" tried to steal the bug.  Maybe because it wasn't ASSIGNED yet?)
Assignee: bugmail → jvarga
Flags: needinfo?(bugmail)
(In reply to Andrew Sutherland [:asuth] from comment #65)
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=3d67b17f242c85916026b16fcd00f456d184d756

I added bc6 jobs (bc1 doesn't seem to run browser_localStorage_e10s.js) and unfortunately, there are some suspicious crashes related to browser_localStorage_e10s.js:
Error: PBackgroundStorage::Msg_Observe Route error: message sent to unknown actor ID
11:34:48     INFO -  GECKO(1708) | Hit MOZ_CRASH(MsgRouteError: PBackgroundStorage::Msg_Observe Route error: message sent to unknown actor ID) at z:/build/build/src/ipc/glue/BackgroundChildImpl.cpp:150

See https://treeherder.mozilla.org/logviewer.html#?job_id=121368006&repo=try&lineNumber=2212
I can't reproduce locally so far.
I'm guessing this can be related to the fact that PStorage actors don't handle ActorDestroy() at all.
I'm going to investigate that.
(In reply to Jan Varga [:janv] from comment #68)
> I added bc6 jobs (bc1 doesn't seem to run browser_localStorage_e10s.js)

Note that I used "./mach try -b do -p all dom/tests/browser/" which also has "--try-test-paths browser-chrome:dom/tests/browser" at the end of the trychooser line

This shows up in both the taskcluster-scheduled bc1's "Job Details" plus your bc6 (which looks to be equivalent in all but name, given that both end up running browser_local_storage_e10s.js from their logs) as:
Untitled data: --this-chunk=1 --total-chunks=1 -- dom/tests/browser
Untitled data: The following arguments were forwarded from mozharness to the test command:
Untitled data: Tests will be run from the following files: dom/tests/browser.

If you want the full suite, you'll need to do a new push, I think.  That or somehow supersede that command.
Oh, I see.
Strange, when I do normal try push:
try: -b do -p all -u mochitest-bc,mochitest-e10s-bc -t none

everything seems to be fine:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=72c7914e8739854014f9be8fdc38ceba18b5dbac
Comment on attachment 8894437 [details] [diff] [review]
Part 10: Update LocalStorage e10s tests for change to PBackground

Review of attachment 8894437 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks a lot for this patch!

::: dom/tests/browser/page_localstorage_e10s.html
@@ +99,5 @@
> +    function checkFunc() {
> +      if (localStorage.getItem(SENTINEL_KEY) === sentinelValue) {
> +        resolve();
> +      } else {
> +        // I believe linters will only yell at us if we use a non-zero constant.

What is "linters" ?
Attachment #8894437 - Flags: review?(jvarga) → review+
(In reply to Jan Varga [:janv] from comment #73)
> everything seems to be fine:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=72c7914e8739854014f9be8fdc38ceba18b5dbac

I retried bc5 (where I found the test running) a whole bunch of times and managed to get you the same crash signature where ProcessHangMonitor::~ProcessHangMonitor() is spinning a nested event loop:
https://treeherder.mozilla.org/logviewer.html#?job_id=121493137&repo=try&lineNumber=3154

Some jobs are still pending, they may or may not crash too.
(In reply to Jan Varga [:janv] from comment #74)
> What is "linters" ?

I'm referring to our ESLint[1] jobs that look for common code errors in our JS code, particularly in the context of this recent dev-platform post about use of setTimeout: https://groups.google.com/d/msg/mozilla.dev.platform/wjCgVIoGHJk/JN9oihczAQAJ

1: https://developer.mozilla.org/en-US/docs/ESLint
Ok, so the crash is real. I'm going to investigate ...
Ok, I think I figured out the crash. It's a silly race between parent/child process.
I'll submit a patch for review soon.
Attachment #8895006 - Flags: review?(bugmail) → review+
Comment on attachment 8892305 [details] [diff] [review]
Part 1: Move PStorage stubs from PContent to PBackground

Man, I can't land this patch w/o a review from a IPC peer.
Bill, this just renames PStorage to PBackgroundStorage (and entire protocol is moved from PContent to PBackground), please rubber stamp this change. Thanks.
Attachment #8892305 - Flags: review?(wmccloskey)
Attachment #8892305 - Flags: review?(wmccloskey) → review+
Pushed by jvarga@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f8241dee5ef3
Part 1: Move PStorage stubs from PContent to PBackground; r=asuth r=billm
https://hg.mozilla.org/integration/mozilla-inbound/rev/d5f2490d5138
Part 2: Core changes for LocalStorage on PBackground; r=asuth
https://hg.mozilla.org/integration/mozilla-inbound/rev/d91109b3ea08
Part 3: Move mozilla::dom::Optional serialization helper to ipc/glue/IPCMessageUtils.h to make it available to other consumers; r=billm
https://hg.mozilla.org/integration/mozilla-inbound/rev/e3e1b4922005
Part 4: Implement serialization for mozilla::OriginAttributesPattern, so we can use it on the receiver side of IPC without bouncing to the main thread; r=asuth
https://hg.mozilla.org/integration/mozilla-inbound/rev/a781f14e975b
Part 5: Factor out the parent actor observer sink to work on the PBackground thread, fix the rest of observer handling to use IPC; r=asuth
https://hg.mozilla.org/integration/mozilla-inbound/rev/371e75eb2b60
Part 6: Fix a deadlock when main process storage child actor triggers storage thread initialization; r=asuth
https://hg.mozilla.org/integration/mozilla-inbound/rev/28506433e4f0
Part 7: Convert asynchronous StorageDBParent initialization to be synchronous and fix low disk space checking; r=asuth
https://hg.mozilla.org/integration/mozilla-inbound/rev/ddbbf8aa33dc
Part 8: Implement BackgroundParent::GetLiveActorArray; r=billm
https://hg.mozilla.org/integration/mozilla-inbound/rev/bbc3dc385fac
Part 9: Move Local Storage event broadcasting from PContent to PBackground; r=asuth
https://hg.mozilla.org/integration/mozilla-inbound/rev/64f5a7a51c21
Part 10: Update LocalStorage e10s tests for change to PBackground. r=janv
https://hg.mozilla.org/integration/mozilla-inbound/rev/042120b49ceb
Part 11: Fix a race between parent/child process; r=asuth
Somebody should add the "Perf" key word.
Keywords: perf
Depends on: 1389316
Any chances of this getting into version 56?
Yes, we're going to request an approval next week.
Comment on attachment 8892305 [details] [diff] [review]
Part 1: Move PStorage stubs from PContent to PBackground

Approval Request Comment
[Feature/Bug causing the regression]: It's not a regression.
[User impact if declined]: This bug is about moving PStorage from PContent to PBackground, the main benefit of this move is improved performance. PStorage protocol contains a sync message which can't be changed to async message. Synchronous messages under PContent top level protocol block main process/main thread which is usually already very busy.
[Is this code covered by automated tests?]: Yes, we have many automated tests for Local Storage implementation.
[Has the fix been verified in Nightly?]: Yes, this landed 8 days ago. There's only one not very frequent crash bug possibly related to this landing and is under investigating.
[Needs manual test from QE? If yes, steps to reproduce]: No manual tests needed.
[List of other uplifts needed for the feature/fix]: Bug 1283609
[Is the change risky?]: Medium risk.
[Why is the change risky/not risky?]: The change is quite big, but the benefit is also quite big, there's just one possible regression/crash so far. Stuff seems to work as expected, but we need broader user audience to be absolutely sure about this change and that's the beta channel.
[String changes made/needed]: None
Attachment #8892305 - Flags: approval-mozilla-beta?
Comment on attachment 8892305 [details] [diff] [review]
Part 1: Move PStorage stubs from PContent to PBackground

Aiming this at 56 beta 5, along with the work supporting QF in bug 1283609. Big changes but should have a good perf benefit for users.
Attachment #8892305 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Jan Varga [:janv] from comment #87)
> [Is this code covered by automated tests?]: Yes, we have many automated
> tests for Local Storage implementation.
> [Has the fix been verified in Nightly?]: Yes, this landed 8 days ago.
> There's only one not very frequent crash bug possibly related to this
> landing and is under investigating.
> [Needs manual test from QE? If yes, steps to reproduce]: No manual tests
> needed.

Settin qe-verify- based on Jan's assessment on manual testing needs and the fact that this fix has automated coverage.
Flags: qe-verify-
(Renaming bug for clarity on what this bug did.)
Summary: PStorage::Msg_Preload sync IPC takes too long → Move LocalStorage from PContent-managed PStorage to PBackground-managed PBackgroundStorage (was: PStorage::Msg_Preload sync IPC takes too long)
Component: DOM → DOM: Core & HTML
Performance Impact: --- → P1
Whiteboard: [qf:p1]
You need to log in before you can comment on or make changes to this bug.