Closed Bug 1337910 Opened 3 years ago Closed 3 years ago
Push messages might take a long time to be processed, on the order of minutes
Stemming from Bug 1329793. (In reply to Sebastian Kaspari (:sebastian) from comment #12) > I often see this fail too when sending tabs from the desktop to my phone. > However it's working some times. So for example when sending a tab after not > using the app for some time, it does not work and I have to sync manually to > receive the tab. However sending another tab right after that often > succeeds. So at least for me it's not permanently broken. Some notes from looking at logcat: Whenever I've seen it "not work", it tends to be 1) delayed, 2) when gecko is running but haven't been used for a while. Specifically, I see that push messages comes in almost momentarily, but it takes a while to hear back from gecko with a decoded payload. Sample logs, note the 1 minute gap: > 02-06 10:13:09.038 4887 7249 D GeckoPushGCM: Message received. Processing on background thread. > 02-06 10:13:09.039 4887 5483 I GeckoPushService: Google Play Services GCM message received; delivering. > 02-06 10:13:09.039 4887 5483 I GeckoPushService: Message directed to service: fxa > 02-06 10:13:09.039 4887 5483 D Telemetry: SendUIEvent: event = action.1 method = service timestamp = 1647855 extras = dom-push-api > 02-06 10:13:09.042 4887 5483 I GeckoPushService: Delivering dom/push message to decode to Gecko! > 02-06 10:13:09.043 4887 5485 I GeckoFxAccountsPush: FxAccountsPush _decodePushMessage > > [... bunch of stuff unrelated to fennec/gecko...] > > 02-06 10:14:04.717 4887 5483 I GeckoPushService: Handling event: FxAccountsPush:ReceivedPushMessageToDecode:Response > 02-06 10:14:04.717 4887 5483 I FxAccountPush: Handling FxA Push Message > 02-06 10:14:04.719 4887 5483 I GeckoLogger: fennec :: FirefoxAccounts :: Requesting sync. This is what happens during the "gap": https://dxr.mozilla.org/mozilla-central/source/mobile/android/components/FxAccountsPush.js?q=FxAccountsPush+_decodePushMessage&redirect_type=single#105-120
After investigating, it seems like this delay is caused by _checkActivating in PushService.jsm (called by _getByPageRecord > registration > PushComponents.js#_handleRequest > PushComponents.js#getSubscription). This is where it gets interesting: the line we hit  shouldn't do anything fancy. : https://dxr.mozilla.org/mozilla-central/source/dom/push/PushService.jsm#181
Nice find! Is it possible "quit-application" fires and we shut the Push service down, even though Gecko is still running?
Thanks to jchen's help, I have been able to narrow the problem down to . Looks like we get blocked here for a while.  https://dxr.mozilla.org/mozilla-central/source/widget/android/nsAppShell.cpp#631
Jim/snorp, I wonder if you might have any pointers regarding Comment 3, or perhaps could forward this NI to someone else who might be able to help? Thanks!
Eduoard, I guess you can reproduce this? I believe a hang here indicates that we get stuck processing the Android message loop on the Gecko thread. It would be useful to get a java stack trace there. I think you can call call 'dump-java-stack' from within jimdb, or 'call dvmDumpAllThreads(true)'.
Flags: needinfo?(snorp) → needinfo?(eoger)
Looks to be a weird interaction between promises and the event loop. I'm working on a fix/workaround.
Call NotifyObservers from the Gecko event queue instead of the Android event queue, to better support Promises in notified JS code.
Attachment #8843415 - Flags: review?(snorp)
Edouard, this patch work for me locally, if you want to test it.
Attachment #8843415 - Flags: review?(snorp) → review+
I have been testing this for almost an hour now, it seems this patch did the trick. However, his bug is not reliably reproducible, so I'd probably wait on grisha to test that too before marking this bug as fixed.
I'll test it today as well. If it's generally useful, and seems to improve things in our case, happy to have it on nightly and get softvision to help out.
If this works we should ask for uplift asap
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/mozilla-inbound/rev/06745caaa027 Notify observers from Gecko event queue; r=snorp
Assignee: nobody → nchen
Status: NEW → ASSIGNED
tracking-fennec: --- → ?
P1 so we have this on our dashboard
Priority: -- → P1
Comment on attachment 8843415 [details] [diff] [review] Notify observers from Gecko event queue (v1) Approval Request Comment [Feature/Bug causing the regression]: N/A [User impact if declined]: Push messages (e.g. from Fx Account) can take a long time to be processed. [Is this code covered by automated tests?]: No [Has the fix been verified in Nightly?]: Yes [Needs manual test from QE? If yes, steps to reproduce]: No [List of other uplifts needed for the feature/fix]: None [Is the change risky?]: Slightly [Why is the change risky/not risky?]: The patch is pretty simple and we're in a new release cycle. [String changes made/needed]: None
Comment on attachment 8843415 [details] [diff] [review] Notify observers from Gecko event queue (v1) Fix a push message issue. Aurora54+.
Attachment #8843415 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8843415 [details] [diff] [review] Notify observers from Gecko event queue (v1) This seems like a core scenario, let's fix this regression, Beta53+
Attachment #8843415 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
This is breaking session restore on Aurora and Beta because on the Gecko side "Session:Restore" now arrives *before* any of the "Tab:Load" messages that normally happen beforehand.
You need to log in before you can comment on or make changes to this bug.