Closed Bug 1345421 Opened 7 years ago Closed 7 years ago

Intermittent dom/tests/browser/browser_localStorage_e10s.js | Storage existence === preload - Got false, expected true

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox-esr52 --- unaffected
firefox54 --- unaffected
firefox55 --- fixed
firefox56 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: asuth)

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell fixed:timing])

Attachments

(2 files)

This looks like test bucketing issue. Hit inbound yesterday, today it's only on autoland.

Andrew, can you take a look at this or find an owner, please? Thank you.
Flags: needinfo?(bugmail)
Whiteboard: [stockwell needswork]
I'll check it out.  This test wants an update to bug 1345990's much superior mechanism for ensuring fresh processes, anyways.
Assignee: nobody → bugmail
Status: NEW → ASSIGNED
I believe I've identified the problem as a race related to DB thread flushes.  mOriginsHavingData is only updated when the db operations are flushed and are synchronously retrieved from the main thread when requested by the actor, not bounced off the database thread (with implied flush).  I was able to reproduce the problem locally and this fixed it, so I've pushed that to try.

The other remaining item is to ensure that the freshly created child reliably receives and processes RecvOriginsHavingData prior to nsGlobalWindow::SetNewDocument being invoked in the content process.  Otherwise the test passes because of a false-positive because of the the special-case logic where preload decides to preload everything until it receives that info.  I was using rr to validate this but it freaked out, so I'll have to upgrade and/or use another avenue.  I'm hoping there's something that was not obvious in my code inspection making this safe by falling inside some reliably enclosing control-flow, because it does currently seems like there's a race.
(In reply to Andrew Sutherland [:asuth] from comment #11)
> The other remaining item is to ensure that the freshly created child
> reliably receives and processes RecvOriginsHavingData prior to
> nsGlobalWindow::SetNewDocument being invoked in the content process. 

It works out with current ordering, but it's somewhat accidental.  LocalStorageManager creation is being triggered as a side-effect of a testing frame script.  And we need a proper PContent/PBrowser round-trip to ensure ordering and we're only getting that because PNecko is currently main thread and so the HTTP channel gets us a round-trip.  Otherwise the guaranteed requests all seem to be unidirectional async to the content process and also dependent on an incidental frame script load.  Especially since HTTP channel loads are moving increasingly to PBackground, I'm going to make the test safer by splitting the tab opening into 2 phases.  We'll load about:blank, then loadURI the browser to the actual page.

=== Details

The mochitest infra file browser-test.js registers `chrome://mochikit/content/tests/SimpleTest/AsyncUtilsContent.js` with the global message manager as a frame script that should be loaded into all processes.  When the child process evaluates the script, the line `EventUtils.navigator = content.document.defaultView.navigator;` induces startup of the LocalStorageManager and creation of our IPC actor with a native stack whose highlights look like:

mozilla::dom::StorageDBChild::StorageDBChild
mozilla::dom::LocalStorageCache::StartDatabase
mozilla::dom::LocalStorageManager::LocalStorageManager
CallGetService
nsGlobalWindow::PreloadLocalStorage
nsGlobalWindow::SetNewDocument
nsDocShell::CreateAboutBlankContentViewer
nsDocShell::EnsureContentViewer
nsDocShell::GetDocument

aside: rr doesn't like xvfb-run sometimes.
Bug 1345990 introduced a "forceNewProcess" argument in
BrowserTestUtils.openNewForegroundTab.  By switching to this we can
stop bloating the process count pref to try and produce equivalent
results.  To minimize test churn and because it doesn't really hurt to
double-check, the code that asserts that our tabs are each in different
processes and related book-keeping infrastructure have been left intact.

We also set a preference to disable preallocated processes in the interest
of maximizing test consistency and minimizing breakage.  It's conceivable
that a preallocated process might end up creating its StorageDBParent
actor prior to when we want, breaking things.  By ensuring the process
isn't created until we want it, we avoid a lot of brittleness.
Attachment #8877643 - Flags: review?(amarchesini)
The intermittent failure appears to have been due to mOriginsHavingData
only being updated when the db thread flushes.  The db thread has a
hard-coded 5 second flush interval.  It's likely that e10s startup was
previously so slow that we were assured of having a flush happen by the
time our fresh process created its parent actor.

We correct this by reliably ensuring a flush before spinning up the
process to check preload state.  We also ensure a flush at the start of
the test for our check that there was no preload in the initial cases.
We were actually more vulnerable in that case, I believe, but as a
browser chrome test, there were no other tests that would have used
content localStorage.

We additionally ensure that the content process has received and
populated mOriginsHavingData by having the tab opening process wait for
about:blank to load in the process before actually opening our origin.
Prior to this change we were depending on orderings that aren't
guaranteed.
Attachment #8877644 - Flags: review?(amarchesini)
Attachment #8877643 - Flags: review?(amarchesini) → review+
Attachment #8877644 - Flags: review?(amarchesini) → review+
Patches pushed, clearing needinfo.
Flags: needinfo?(bugmail)
https://hg.mozilla.org/mozilla-central/rev/9c5366883d69
https://hg.mozilla.org/mozilla-central/rev/13c27668f2d7
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Whiteboard: [stockwell needswork] → [stockwell fixed]
Flags: qe-verify-
Whiteboard: [stockwell fixed] → [stockwell fixed:timing]
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: