Closed Bug 1337910 Opened 3 years ago Closed 3 years ago

Push messages might take a long time to be processed, on the order of minutes


(Android Background Services Graveyard :: Firefox Accounts, defect, P1)



(fennec54+, firefox53 fixed, firefox54 fixed, firefox55 fixed)

Firefox 55
Tracking Status
fennec 54+ ---
firefox53 --- fixed
firefox54 --- fixed
firefox55 --- fixed


(Reporter: Grisha, Assigned: jchen)




(1 file)

Stemming from Bug 1329793.

(In reply to Sebastian Kaspari (:sebastian) from comment #12)
> I often see this fail too when sending tabs from the desktop to my phone.
> However it's working some times. So for example when sending a tab after not
> using the app for some time, it does not work and I have to sync manually to
> receive the tab. However sending another tab right after that often
> succeeds. So at least for me it's not permanently broken.

Some notes from looking at logcat:

Whenever I've seen it "not work", it tends to be 1) delayed, 2) when gecko is running but haven't been used for a while. Specifically, I see that push messages comes in almost momentarily, but it takes a while to hear back from gecko with a decoded payload. Sample logs, note the 1 minute gap:

> 02-06 10:13:09.038  4887  7249 D GeckoPushGCM: Message received.  Processing on background thread.
> 02-06 10:13:09.039  4887  5483 I GeckoPushService: Google Play Services GCM message received; delivering.
> 02-06 10:13:09.039  4887  5483 I GeckoPushService: Message directed to service: fxa
> 02-06 10:13:09.039  4887  5483 D Telemetry: SendUIEvent: event = action.1 method = service timestamp = 1647855 extras = dom-push-api
> 02-06 10:13:09.042  4887  5483 I GeckoPushService: Delivering dom/push message to decode to Gecko!
> 02-06 10:13:09.043  4887  5485 I GeckoFxAccountsPush: FxAccountsPush _decodePushMessage
> [... bunch of stuff unrelated to fennec/gecko...]
> 02-06 10:14:04.717  4887  5483 I GeckoPushService: Handling event: FxAccountsPush:ReceivedPushMessageToDecode:Response
> 02-06 10:14:04.717  4887  5483 I FxAccountPush: Handling FxA Push Message
> 02-06 10:14:04.719  4887  5483 I GeckoLogger: fennec :: FirefoxAccounts :: Requesting sync.

This is what happens during the "gap":
After investigating, it seems like this delay is caused by _checkActivating in PushService.jsm (called by _getByPageRecord > registration > PushComponents.js#_handleRequest > PushComponents.js#getSubscription).

This is where it gets interesting: the line we hit [0] shouldn't do anything fancy.

Nice find! Is it possible "quit-application" fires and we shut the Push service down, even though Gecko is still running?
Thanks to jchen's help, I have been able to narrow the problem down to [0].
Looks like we get blocked here for a while.

Jim/snorp, I wonder if you might have any pointers regarding Comment 3, or perhaps could forward this NI to someone else who might be able to help? Thanks!
Flags: needinfo?(snorp)
Flags: needinfo?(nchen)
Eduoard, I guess you can reproduce this? I believe a hang here indicates that we get stuck processing the Android message loop on the Gecko thread. It would be useful to get a java stack trace there. I think you can call call 'dump-java-stack' from within jimdb, or 'call dvmDumpAllThreads(true)'.
Flags: needinfo?(snorp) → needinfo?(eoger)
Looks to be a weird interaction between promises and the event loop. I'm working on a fix/workaround.
Flags: needinfo?(nchen)
Call NotifyObservers from the Gecko event queue instead of the Android
event queue, to better support Promises in notified JS code.
Attachment #8843415 - Flags: review?(snorp)
Edouard, this patch work for me locally, if you want to test it.
Attachment #8843415 - Flags: review?(snorp) → review+
I have been testing this for almost an hour now, it seems this patch did the trick.
However, his bug is not reliably reproducible, so I'd probably wait on grisha to test that too before marking this bug as fixed.
Flags: needinfo?(eoger)
I'll test it today as well. If it's generally useful, and seems to improve things in our case, happy to have it on nightly and get softvision to help out.
If this works we should ask for uplift asap
Pushed by
Notify observers from Gecko event queue; r=snorp
Assignee: nobody → nchen
tracking-fennec: --- → ?
P1 so we have this on our dashboard
Priority: -- → P1
See Also: → 1344892
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 55
Comment on attachment 8843415 [details] [diff] [review]
Notify observers from Gecko event queue (v1)

Approval Request Comment
[Feature/Bug causing the regression]: N/A
[User impact if declined]: Push messages (e.g. from Fx Account) can take a long time to be processed.
[Is this code covered by automated tests?]: No
[Has the fix been verified in Nightly?]: Yes
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: Slightly
[Why is the change risky/not risky?]: The patch is pretty simple and we're in a new release cycle.
[String changes made/needed]: None
Attachment #8843415 - Flags: approval-mozilla-beta?
Attachment #8843415 - Flags: approval-mozilla-aurora?
Comment on attachment 8843415 [details] [diff] [review]
Notify observers from Gecko event queue (v1)

Fix a push message issue. Aurora54+.
Attachment #8843415 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
tracking-fennec: ? → 54+
Comment on attachment 8843415 [details] [diff] [review]
Notify observers from Gecko event queue (v1)

This seems like a core scenario, let's fix this regression, Beta53+
Attachment #8843415 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
This is breaking session restore on Aurora and Beta because on the Gecko side "Session:Restore" now arrives *before* any of the "Tab:Load" messages that normally happen beforehand.
Depends on: 1346634
You need to log in before you can comment on or make changes to this bug.