Open Bug 1626935 Opened 4 years ago Updated 2 years ago

IndexedDB is not fully usable before the event loop starts, impacts XPConnect JS services which must defer some logic or experience assertion failures: data.mRecursionDepth > mBaseRecursionDepth

Categories

(Core :: Storage: IndexedDB, defect, P3)

defect

Tracking

()

Tracking Status
firefox76 --- affected

People

(Reporter: Gijs, Unassigned)

References

Details

See assertion failures on https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=295911786&resultStatus=testfailed%2Cbusted%2Cexception%2Crunning%2Cpending%2Crunnable&revision=a99771ff5245ab7e6acd02e4b8320a3bda05270a :

GECKO(1226) | Assertion failure: data.mRecursionDepth > mBaseRecursionDepth, at /builds/worker/checkouts/gecko/xpcom/base/CycleCollectedJSContext.cpp:546

I don't really know what the deal is there.

Summary: Using a shared global DB object for remote settings without any promise awaiting first causes assertion failures - → Using a shared global DB object for remote settings without any promise awaiting first causes assertion failures - data.mRecursionDepth > mBaseRecursionDepth
See Also: → 1624983

So this happens with just https://phabricator.services.mozilla.com/D68465 applied to mozilla-central?

(In reply to Simon Giesecke [:sg] [he/him] from comment #1)

So this happens with just https://phabricator.services.mozilla.com/D68465 applied to mozilla-central?

On linux debug, yes, though I intend to work around this issue by re-introducing a delay (which is there today because we always await an async "open the db" method). So a perma-link to the specific diff is https://phabricator.services.mozilla.com/D68465?id=253652 . You can specify which diff to apply if you use arc patch; I don't think there's a switch to do so in moz-phab...

(In reply to :Gijs (he/him) from comment #2)

(In reply to Simon Giesecke [:sg] [he/him] from comment #1)

So this happens with just https://phabricator.services.mozilla.com/D68465 applied to mozilla-central?

On linux debug, yes, though I intend to work around this issue by re-introducing a delay (which is there today because we always await an async "open the db" method). So a perma-link to the specific diff is https://phabricator.services.mozilla.com/D68465?id=253652 . You can specify which diff to apply if you use arc patch; I don't think there's a switch to do so in moz-phab...

Alternatively, import https://hg.mozilla.org/try/rev/f6c74d7b9dd14785c31265dfe14d6d30bf08491e and remove the else clause at https://hg.mozilla.org/try/rev/f6c74d7b9dd14785c31265dfe14d6d30bf08491e#l1.609 which seems to fix the assertion for me locally (and I'm hopeful about try, https://treeherder.mozilla.org/#/jobs?repo=try&resultStatus=retry%2Csuccess%2Ctestfailed%2Cbusted%2Cexception%2Crunning%2Cpending%2Crunnable&revision=3333f08b727989cc15fb8733f4207011d4cd3741 )

I added some information to the Pernosco notebook, including the JS stack of the last event that completed on the main thread. However, it's not clear to me if the recursion depth is wrong, the assertion is wrong or if the method shouldn't be called at this time. There's this strange exception for OS X/Cocoa which was introduced by Bug 1261143...

Andrew, does the trace tell you anything about that?

Flags: needinfo?(bugmail)

So this is happening because of the specialized IndexedDB hooks put into the web platform that make sure that a transaction is marked as inactive after JS control flow yields the event loop. The spec things around this are:

(IndexedDB originally used the "metastable" state hook for this, but then the spec grew that specialized handling because of promises. Implementation realities are somewhat more awkward than the spec.)

The assertion is firing here and this whole situation is weird because there is no event loop on the stack. nsXREDirProvider::DoStartup is calling NS_CreateServicesFromCategory("profile-after-change", ...) which is using XPConnect to call into resource://gre/modules/URLDecorationAnnotationsService.jsm which is registered as a service by the antitracking component and then registered with the component manager, so the logic that's trying to make the transaction-cleanup happen correctly and compensate for nested event loops is very unhappy.

I think the 2 questions here are:

  1. Will XPConnect perform a microtask checkpoint on the way out of the JSM code via CallbackObject.cpp or similar? If so, this check can be made happier.
  2. Is it sane to be trying to run IndexedDB and similar APIs so early in startup? (It should probably be sane, but it's worth asking.)
Flags: needinfo?(bugmail)

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #6)

The assertion is firing here and this whole situation is weird because there is no event loop on the stack. nsXREDirProvider::DoStartup is calling NS_CreateServicesFromCategory("profile-after-change", ...) which is using XPConnect to call into resource://gre/modules/URLDecorationAnnotationsService.jsm which is registered as a service by the antitracking component and then registered with the component manager, so the logic that's trying to make the transaction-cleanup happen correctly and compensate for nested event loops is very unhappy.

Is it sane to be trying to run IndexedDB and similar APIs so early in startup? (It should probably be sane, but it's worth asking.)

Johann / Dimi, is it expected we create the service so early, and does it need to access its DB immediately or actually only once it gets asked for anything (and is that likely to be for user navigation, or do we need it "even" for background requests we make ourselves)?

Flags: needinfo?(jhofmann)
Flags: needinfo?(dlee)

So this bug is a little complicated and I haven't tried to fully understand it yet, but to not block on that I'll broadly answer your question about the skip list service:

In the past we had very concrete breakage where folks were lagging on skip list updates that would have enabled them to play Facebook games (they started the browser, loaded Facebook from session restore or whatever and a bit later the service got its data. Apparently this is a reason to switch browsers. So we ensured that we bundle the skip list into binaries in bug 1581160.

That specific case, IIRC, was more about the fact that we first needed to download the remote settings collection, which of course had a huge delay sometimes. Generally I would still say that we should have the skip list service fully operational (having distributed its data to the url classifier) before any external (non-Mozilla) requests are made, though I can't point to concrete cases where this would have been necessary in the past.

I'm mostly worried about adding delay because the url classifier won't block on having this data, so if we make it lazy to the point that we miss early connections under certain conditions, it may be very hard for us to detect or verify, especially if the behavior might differ due to bad performance on end user machines.

Flags: needinfo?(jhofmann)

Not sure if I understand this correctly.
In Comment 6, it was talking about URLDecorationAnnotationsService.jsm.
And johann's comment was about UrlClassifierSkipListService.jsm, I think that is different?

For SkipList, we init it when URL Classifier receives the first request that needs to be classified (so internal resources aren't included).
As for URLDecorationAnnotationsService, I don't really understand it so I am not sure if we can delay its initialization.

Flags: needinfo?(dlee)

Oh, duh, we’re talking about the url decoration service. Sorry, please disregard comment 8 then. Thanks Dimi!

For the link decoration thing, it shouldn’t be needed on the first requests since it will only apply when document.referrer is accessed, maybe Steve or Baku can confirm that.

(In reply to Dimi Lee [:dimi][:dlee] from comment #9)

Not sure if I understand this correctly.
In Comment 6, it was talking about URLDecorationAnnotationsService.jsm.
And johann's comment was about UrlClassifierSkipListService.jsm, I think that is different?

For SkipList, we init it when URL Classifier receives the first request that needs to be classified (so internal resources aren't included).
As for URLDecorationAnnotationsService, I don't really understand it so I am not sure if we can delay its initialization.

Oh, interesting... but I suppose both use Remote Settings so both will go through this path, and whoever comes second will trip this assertion if I remove the await Promise.resolve() code - or that will start happening if anything else creates the DB before either of the two services run. From a perf perspective, I still have questions about when this code really needs to run, but I guess if at least one of them is running, as soon as anyone else touches remote settings earlier, things go pearshaped. I'll file a separate issue on how early this really needs to run.

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #6)

Will XPConnect perform a microtask checkpoint on the way out of the JSM code via CallbackObject.cpp or similar? If so, this check can be made happier.

:arai, do you know what guarantees we have about microtask checkpoints when constructing xpcom services through xpconnect?

Flags: needinfo?(arai.unmht)

(this collided but I have to run away again)

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #6)

  1. Is it sane to be trying to run IndexedDB and similar APIs so early in startup? (It should probably be sane, but it's worth asking.)

Apologies for not clarifying this better, I was trying to run away from the computer at the moment and was worried I'd lose track of this bug/comment. My concern was more one of "does enough of the runtime exist at this point for invariants that are necessary for complex Web APIs to work?" but without having the time to dig into things. Also, I conflated the concern about there not being an event loop on the stack.

I think the answer is probably yes, we're far enough in startup to deal with this and I guess I was just a little surprised that there wasn't already an event loop running at profile-after-change.

Specifically:

  • nsLayoutStatics::Initalize has completed, which is really only the startup phase we care about.
  • By the time we assert, 2 ChromeWorkers have already been created, "resource://services-settings/RemoteSettingsWorker.js" and "resource://gre/modules/osfile/osfile_async_worker.js"
  • A nested event loop has already been spun on the stack, it looks like by the "addons-startup" notification which happens prior to "profile-after-change".

But it does seem like, unlike CalbackObject.cpp, nsXPCWrappedJS::CallMethod does not cause a microtask checkpoint to be triggered as the XPConnect call from C++ into JS unwinds.

:bholley, what's the XPConnect stance on microtask checkpoints in situations like this (no event loop on the stack, but a reasonable time for JS XPCOM to be running as we're in "profile-after-change" which is a responsible time to initialize stuff). Thanks!

Flags: needinfo?(bholley)
See Also: → 1630939

(In reply to :Gijs (he/him) from comment #11)

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #6)

Will XPConnect perform a microtask checkpoint on the way out of the JSM code via CallbackObject.cpp or similar? If so, this check can be made happier.

:arai, do you know what guarantees we have about microtask checkpoints when constructing xpcom services through xpconnect?

I'm not aware of any relation between microtask checkpoints and XPCOM/XPConnect themselves.
to my understanding they're orthogonal.

Flags: needinfo?(arai.unmht)

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #12)

:bholley, what's the XPConnect stance on microtask checkpoints in situations like this (no event loop on the stack, but a reasonable time for JS XPCOM to be running as we're in "profile-after-change" which is a responsible time to initialize stuff). Thanks!

As to the question of whether XPCWrappedJSClass::CallMethod should do a MicroTask checkpoint the same way WebIDL Callbacks do, I don't have a strong opinion. I can see a conceptual argument for doing so, so that e.g. promise callbacks queued from JS-implemented XPCOM methods would have similar semantics to the DOM case. But it also seems like more potential complexity, and we've lived fine without them so far.

In general, I think it's pretty problematic to be running JS at all without an event loop. Much of today's JS assumes there is an event loop, and I'd encourage us to make sure that property holds for any point during startup where we plan to execute a lot of random scripted initialization code.

Flags: needinfo?(bholley)

The priority flag is not set for this bug.
:asuth, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(bugmail)

Thanks, :bholley! It also makes sense to me that code that depends on an event loop existing should probably wait until the event loop exists.

This does somewhat raise the meta-issue that the JS code in question is using JS promises which won't actually be processed until some future turn of the event loop.

I don't think there's any immediate action to be taken here as the problem has been worked around, but I'm going to leave the bug open for discoverability and tracking.

Flags: needinfo?(bugmail)
Priority: -- → P3
Summary: Using a shared global DB object for remote settings without any promise awaiting first causes assertion failures - data.mRecursionDepth > mBaseRecursionDepth → IndexedDB is not fully usable before the event loop starts, impacts XPConnect JS services which must defer some logic or experience assertion failures: data.mRecursionDepth > mBaseRecursionDepth

Because this bug's Severity has not been changed from the default since it was filed, and it's Priority is P3 (Backlog,) indicating it has been triaged, the bug's Severity is being updated to S3 (normal.)

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.