Closed Bug 987931 Opened 10 years ago Closed 10 years ago

[email] cronsync's self-close races an IndexedDB commit

Categories

(Firefox OS Graveyard :: Gaia::E-Mail, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(b2g-v2.0 affected, b2g-v2.1 fixed)

RESOLVED FIXED
2.1 S2 (15aug)
Tracking Status
b2g-v2.0 --- affected
b2g-v2.1 --- fixed

People

(Reporter: asuth, Assigned: asuth)

References

Details

Attachments

(2 files)

On 1.4 trunk I was seeing a lot if situations like this last week with 2 active account types:
03-17 15:53:26.699 I/GeckoDump(  873): LOG: mail sync complete, closing mail app 13420.936635999999
03-17 15:53:26.759 E/GeckoConsole(  137): [JavaScript Error: "IndexedDB UnknownErr: IDBTransaction.cpp:863"]

We should stop doing this.  The optimal way to do it is probably to bolster the logging that shows up in console.log a bit more since that's a general goal.  (Accomplishing that by exposing the existing logging in the back-end rather than adding more console.log, however.)
This has been addressed on bug 921050 and bug 1039007 addressing the issue for v2.1.  Bug 1042221 has been created to uplift the fixes to v2.0.
See Also: → 1042221
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Whoops.  Not actually fixed.  Reopening and now we're going to get a nice back-end test for this.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Blocks: 1042221
Assignee: nobody → bugmail
Status: REOPENED → ASSIGNED
Test infrastructure changes (and why):

- mozAlarms refused to add alarms if we weren't an install app, so we're now an installed app

- A reduced version of the email app manifest now lives at test/manifest.webapp.  It's how we get our permissions.

- There's a lot less kludgy iframe stuff with poking permissions, etc.  Instead we just use the iframe "mozapp" magic.  This necessitated having the rest runner load a (chrome) HTML iframe inside its (outer chrome) XUL iframe, since XUL iframes are entirely different from HTML iframes.  I also removed a lot of dead/unused code.

- There was some devicestorage fallout from that change since it made us only have "readcreate" access and not "readwrite" access.  There also may have been some side-effects from us being an app instead of just a magically authorized origin.  A good change that came from this is that I noticed devicestorage added an override mechanism so we are now able to have our separate test file runs (including variants) have actually distinct directories they use instead of trying to manually clean up after themselves.  A net win is that we're now more realistic when it comes to attachment stuff.

- Made the actor.ignore_* not result in sadness if you then tried to do .expect_* for the same thing after the ignore.  Previously it guaranteed failure, now we're okay.  I may have ended up not needing this after all was said and done.

- Made it so tests can have multiple different accounts active.  While creating the cronsync test it turned out there were a lot of assumptions that we only had one account and then a bunch of hacks/kludges to deal with that.  Most notable is probably now that we also have a testAccount.foldersSlice in addition to the testUniverse.allFoldersSlice footgun.  (Finding the first 'inbox' in a list of two accounts' folders when you actually want the second inbox is clearly not great.)

- moved the outbox tests' shared code (shared_outbox.js) into th_main.js because the cronsync tests wanted them.  Some cleanup.


Connection management changes:

- Wrapped the decision to call closeUnusedConnections into maybeCloseUnusedConnections

- When we run out of job-ops to process, we now call `allOperationsCompleted` on the account.  In the case of IMAP it then calls `maybeCloseUnusedConnections`.  This was needed for cronsync so we would kill the connection after the snippets were fetched because `areServerJobsWaiting` returns true while the downloadBodies job was cleaning itself up, which was the only time we'd consider kill the connection off prior to the data staleness death criterion.


Cronsync changes:
- Added a lot more commentary to cronsync.js

- Its constants were moved to syncbase.js and one constant split into two separate ones.

- Added some more log events for the benefit of testing.

- Changed the slice life-cycle so we don't kill it until after the downloadBodies snippet-fetching would have been enqueued.

(Prior to this patch I think we might end up killing the connection after the sync completed, then established a new connection for downloadBodies, then left that hanging around.)

- test_cronsync_wait_for_completion.js added


What still remains to be done:
- Make the cronsync test work for ActiveSync and POP3.  I think they're mainly going to be angry about connection stuff.  I hope I hope I hope.

- Do on-device testing to make sure we are never racing shutdown anymore.  I think these tests provide a fairly strong guarantee, but it never hurts to be thorough.


What I think makes sense as future-work:
- Some more cronsync tests related to being offline or involving other weird edge cases.


Uplift concerns:
- I had to touch some outbox code that doesn't exist on v2.0.  I think it's all survivable, but I think the uplift is going to require you to do a (hopefully small) review pass on that too since the cronsync control-flow changed slightly with the introduction of the outbox.  We're doing a lazy man's uplift, so the good news is all the back-end testing horrors don't apply to us.
Attachment #8468253 - Flags: review?(m)
Okay, the tests now work on ActiveSync and POP3, so if you were holding out on that for your review, hold out no more!  I still need to do the on-device testing, but that's going to work.  (I ended up spending more time on the pinch/zoom than planned and the verizon/netflix mozilla fallout (bug 1040297) is making getting a trunk build more of a hassle than I want to deal with at this late hour.  Also, plumbing entropy is horrible.  Like, literal plumbing, not software plumbing.  I love me some software plumbing...)
Flags: needinfo?(m)
Attachment #8468243 - Flags: review?(m) → review+
Flags: needinfo?(m)
Comment on attachment 8468253 [details] [review]
GELAM PR: cronsync IMAP tests plus many required cleanups in tests, POP3/ActiveSync as trivial in-bug follow-up

Yuck, sounds like a literal and/or figurative mess.

These changes look good to me, a lot of nice cleanups and assertions that should keep the sync demons at bay (or at least make it easier to kill them). A few of the comments aren't immediately crystal clear to me because they fall from specific snafus or historical journeys you had to trudge through, but they're clear enough that they'll make sense if/when I find myself in those areas of code. (If I was truly confused about something and thought it should be reworded, I'd comment about it and ask for clarification.)

I do appreciate the informal and comedic touches -- methods that end in "Yo", the jokes in comments... insert an implied "lol" and/or "YES" response for most of those.
Attachment #8468253 - Flags: review?(m) → review+
I jinxed the on-device testing thing.  There's still some outstanding internal transaction that freaks IndexedDB out on devices.  I briefly instrumented IndexedDB and was able to conclude that it is our "b2g-email" db and not some other user of IndexedDB in our process.
I cranked up the logging in the form of "realtime-dangerous" which is based on an old patchset to shunt loggest logging directly to logcat.  I'm proposing we adopt it as a new mode.  I also have a cleanup of the gaia secret debug UI that makes that UI less awkward.

Anywho, if we run with IndexedDB's profiling markers shunted to adb logcat via a small patch, we can see it's this transaction getting interrupted at shutdown:
08-10 03:16:14.524 I/Gecko   (15438): IndexedDB Request 8: database("b2g-email").transaction("readonly").objectStore("headerBlocks").get("0/0:0")

That should not be happening, of course.  And then if we look a little bit before in our "realtime-dangerous" logs, we see:
08-10 03:16:14.484 I/Gecko   (15438): ImapAccount(0)["saveAccountState_end","checkpointSync",0,1407654974498000,112]
08-10 03:16:14.484 I/Gecko   (15438): FolderStorage(0/0)["mutexedCall_end","sync",1407654974501000,113]
08-10 03:16:14.494 I/Gecko   (15438): CronSync([object Object])["syncAccountHeaders_end","0",[],1407654974510000,114]
08-10 03:16:14.494 I/Gecko   (15438): CronSync([object Object])["syncAccount_end","0",1407654974512000,115]
08-10 03:16:14.504 I/Gecko   (15438): FolderStorage(0/0)["mutexedCall_begin","sync",1407654974517000,116]
08-10 03:16:14.504 I/Gecko   (15438): FolderStorage(0/0)["loadBlock_begin","header","0",1407654974518000,117,null]

That second mutexed call under the guise of "sync" should definitely not be happening.  So we look back a bit earlier:
08-10 03:16:13.164 I/Gecko   (15438): MailBridge(undefined)["cmd","viewFolderMessages",1407654973179000,26,1407654973180000,27,null]

That also does not seem like it should be happening, so then we look in the front-end code and we find that this is probably header_cursor.js probably being triggered by model.js.  While we still want to do some cleanup on them and migrate them into the back-end in general, I think the answer for this bug is that the front-end should not be loading/triggering them unless it's trying to spin up the UI.  The pure-cronsync mode does not need them.
It sounds like even when we do refactor model to go inside the API, we will just need to be careful about when we make new slices in the front end.

For now, if you want a fix, I tried this in a branch:

https://github.com/jrburke/gaia/compare/email-lazy-header-cursor

In it, header_cursor waits until there are actual listeners that want to communicate with it before setting up a slice. As long as no UI card views are created, then it just lays dormant. Initial testing with a background sync and the log statement in the fresh slice method indicate it is not created if a pure background sync with no UI showing.

If you think this would be useful to land, I can formalized by creating a new bug and tie a pull request to it.
:jrburke, that's awesome, thanks!  I'd like to fold that patch into the gaia portion of my landing of this bug and uplift it too, if that's okay with you.

Also, if you can r? this commit to update the secret debug UI/etc., that would also be good, as I want to fold that into the patch too because I also want the super realtime logging:
https://github.com/asutherland/gaia/commit/a1eec313916c692d588582dcfc62f7c5f71a84f1

(If you want to test with all of it, that's part of the https://github.com/asutherland/gaia/compare/email-cronsync-save branch which is just that now, and if you install-gaia the https://github.com/asutherland/gaia-email-libs-and-more/tree/cronsync-tests branch into it, you'll get that stuff and it will work, etc. etc.)
Flags: needinfo?(jrburke)
(In reply to James Burke [:jrburke] from comment #9)
> It sounds like even when we do refactor model to go inside the API, we will
> just need to be careful about when we make new slices in the front end.

Oh, indeed.  And that's an example of something where we definitely want front-end integration tests for cronsync too, probably just with some guards that cause explosions/failure if the UI tries to spin stuff up.  Really, until we have ServiceWorkers and can use SharedWorkers there's always going to be some risk of the front-end spinning something up because of all the heroics it has to perform based on alarm/activity/notification/etc. and with the lazy loading, etc.
r+ on the secret debug makeover, had one comment left in the commit. I also tried out this style block in setup_cards.css to center the buttons with some space around them:

.card-settings-debug button {
  display: block;
  margin: 1.5rem auto;
  width: calc(100% - 3rem);
}

(copied from the delete button in settings_account.html)

Feel free to pull in the header_cursor change as you see fit, mind the console.log in there. 

I gave it a quick pass when I tried it: both waiting for sync to run in background, and no log. I then tried a sync and then a "bring email to front", then a cold launch test, then going next/prev in the message_reader. Seemed to behave as expected.

The tests longer term to fail if the front end triggers inappropriate slice work sound good.
Flags: needinfo?(jrburke)
Comment on attachment 8468253 [details] [review]
GELAM PR: cronsync IMAP tests plus many required cleanups in tests, POP3/ActiveSync as trivial in-bug follow-up

One extra commit on the pull request to be (incrementally) reviewed.  It's at:
https://github.com/asutherland/gaia-email-libs-and-more/commit/832ac891e0f4f2fb4c581e80561fb407a73d5241


And it:
- Changes cronsync to no longer be created until after the universe has started up, avoiding some gratuitous promise stuff.  This resulted in some re-indentation, so it may be advisable to look at the whitespace-version of the diff:
https://github.com/asutherland/gaia-email-libs-and-more/commit/832ac891e0f4f2fb4c581e80561fb407a73d5241?w=1

- Adds "realtime-dangerous" which gets explained and generally looks like:
08-11 16:25:00.908 I/Gecko   ( 1719): WLOG: FETCHED 21 known id 12 known srvid 12388 actual id 12412
08-11 16:25:00.908 I/Gecko   ( 1719): ImapProtoConn(142)["data",25,1407788700925000,87,"A7 OK UID FETCH completed"]
08-11 16:25:00.918 I/Gecko   ( 1719): ImapProtoConn(142)["cmd_end","A7 ","UID FETCH",1407788700926000,88," 12412,12410,12409,12408,12406,12405,12404,12403,12402,12401,12400,12399,12398,12397,12396,12394,12393,12392,12391,12390,12389,12388 (UID FLAGS INTERNALDATE)"]
08-11 16:25:00.918 I/Gecko   ( 1719): ImapFolderConn(0/0)["syncDateRange_end",0,22,0,1407715200000,null,null,null,1407788700928000,89]
08-11 16:25:00.918 I/Gecko   ( 1719): WLOG: Sync Completed! null days 22 messages synced
08-11 16:25:00.918 I/Gecko   ( 1719): WLOG: folder message count 9971 dbCount 34 syncedThrough null oldest known 1407464437000
08-11 16:25:00.918 I/Gecko   ( 1719): ImapAccount(0)["saveAccountState_begin","checkpointSync",null,1407788700931000,90]
08-11 16:25:00.928 I/Gecko   ( 1719): MailBridge(undefined)["send","batchSlice",1407788700936000,91,{"type":"batchSlice","handle":2,"status":"synced","progress":0,"atTop":false,"atBottom":false,"userCanGrowUpwards":false,"userCanGrowDownwards":false,"sliceUpdates":[{"updates":[0,{"id":"0/0","name":"Inbox","type":"inbox","path":"INBOX","parentId":null,"delim":"/","depth":0,"lastSyncedAt":1407788700929}],"type":"update"}]}]
08-11 16:25:00.958 I/Gecko   ( 1719): ImapAccount(0)["saveAccountState_end","checkpointSync",0,1407788700970000,92]
08-11 16:25:00.958 I/Gecko   ( 1719): FolderStorage(0/0)["mutexedCall_end","sync",1407788700971000,93]
08-11 16:25:00.958 I/Gecko   ( 1719): CronSync([object Object])["syncAccountHeaders_end","0",[],1407788700972000,94]
08-11 16:25:00.958 I/Gecko   ( 1719): CronSync([object Object])["syncAccount_end","0",1407788700972000,95]
08-11 16:25:00.958 I/Gecko   ( 1719): ImapAccount(0)["releaseConnection",1407788700973000,96,"0/0","sync"]
08-11 16:25:00.958 I/Gecko   ( 1719): ImapProtoConn(142)["cmd_begin","A8 ","CLOSE",1407788700974000,97,null]
08-11 16:25:00.968 I/Gecko   ( 1719): WLOG: Killing unused IMAP connection.
08-11 16:25:00.968 I/Gecko   ( 1719): ImapAccount(0)["deadConnection","unused",1407788700977000,98,null]
08-11 16:25:00.968 I/Gecko   ( 1719): CronSync([object Object])["killSlices",0,1407788700978000,99]
08-11 16:25:00.968 I/Gecko   ( 1719): MailBridge(undefined)["send","cronSyncStop",1407788700979000,100,{"type":"cronSyncStop","accountsResults":{"accountIds":["0"]}}]
08-11 16:25:00.968 I/GeckoDump( 1719): LOG: email oncronsyncstop: 0
08-11 16:25:00.968 I/Gecko   ( 1719): CronSync([object Object])["syncAccounts_end",{"accountIds":["0"]},1407788700980000,101]
08-11 16:25:00.968 I/GeckoDump( 1719): LOG: email: clearing wake locks for "id0"
08-11 16:25:00.968 I/Gecko   ( 1719): CronSync([object Object])["cronSync_end",1407788700980000,102]
08-11 16:25:00.968 I/GeckoDump( 1719): LOG: mail sync complete, closing mail app 13181.683159
08-11 16:25:00.988 I/Gecko   ( 1719): 

Yes, the fact that CronSync is reported as having an object for its semantic ident is probably a bug, but I think that one's survivable for now, although I can fix it.

Note: I'm also going to address the comment nits you pointed out now.
Attachment #8468253 - Flags: review+ → review?(m)
Comment on attachment 8468253 [details] [review]
GELAM PR: cronsync IMAP tests plus many required cleanups in tests, POP3/ActiveSync as trivial in-bug follow-up

Looks good. This mode should prove useful.
Attachment #8468253 - Flags: review?(m) → review+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: