Push messages might take a long time to be processed, on the order of minutes

RESOLVED FIXED in Firefox 53

Status

Android Background Services
Firefox Accounts
P1
normal
RESOLVED FIXED
2 years ago
a year ago

People

(Reporter: Grisha, Assigned: jchen)

Tracking

unspecified
Firefox 55

Firefox Tracking Flags

(fennec54+, firefox53 fixed, firefox54 fixed, firefox55 fixed)

Details

Attachments

(1 attachment)

(Reporter)

Description

2 years ago
Stemming from Bug 1329793.

(In reply to Sebastian Kaspari (:sebastian) from comment #12)
> I often see this fail too when sending tabs from the desktop to my phone.
> However it's working some times. So for example when sending a tab after not
> using the app for some time, it does not work and I have to sync manually to
> receive the tab. However sending another tab right after that often
> succeeds. So at least for me it's not permanently broken.

Some notes from looking at logcat:

Whenever I've seen it "not work", it tends to be 1) delayed, 2) when gecko is running but haven't been used for a while. Specifically, I see that push messages comes in almost momentarily, but it takes a while to hear back from gecko with a decoded payload. Sample logs, note the 1 minute gap:

> 02-06 10:13:09.038  4887  7249 D GeckoPushGCM: Message received.  Processing on background thread.
> 02-06 10:13:09.039  4887  5483 I GeckoPushService: Google Play Services GCM message received; delivering.
> 02-06 10:13:09.039  4887  5483 I GeckoPushService: Message directed to service: fxa
> 02-06 10:13:09.039  4887  5483 D Telemetry: SendUIEvent: event = action.1 method = service timestamp = 1647855 extras = dom-push-api
> 02-06 10:13:09.042  4887  5483 I GeckoPushService: Delivering dom/push message to decode to Gecko!
> 02-06 10:13:09.043  4887  5485 I GeckoFxAccountsPush: FxAccountsPush _decodePushMessage
> 
> [... bunch of stuff unrelated to fennec/gecko...]
> 
> 02-06 10:14:04.717  4887  5483 I GeckoPushService: Handling event: FxAccountsPush:ReceivedPushMessageToDecode:Response
> 02-06 10:14:04.717  4887  5483 I FxAccountPush: Handling FxA Push Message
> 02-06 10:14:04.719  4887  5483 I GeckoLogger: fennec :: FirefoxAccounts :: Requesting sync.

This is what happens during the "gap": https://dxr.mozilla.org/mozilla-central/source/mobile/android/components/FxAccountsPush.js?q=FxAccountsPush+_decodePushMessage&redirect_type=single#105-120
After investigating, it seems like this delay is caused by _checkActivating in PushService.jsm (called by _getByPageRecord > registration > PushComponents.js#_handleRequest > PushComponents.js#getSubscription).

This is where it gets interesting: the line we hit [0] shouldn't do anything fancy.

[0]: https://dxr.mozilla.org/mozilla-central/source/dom/push/PushService.jsm#181
Nice find! Is it possible "quit-application" fires and we shut the Push service down, even though Gecko is still running?
Thanks to jchen's help, I have been able to narrow the problem down to [0].
Looks like we get blocked here for a while.

[0] https://dxr.mozilla.org/mozilla-central/source/widget/android/nsAppShell.cpp#631
(Reporter)

Comment 4

a year ago
Jim/snorp, I wonder if you might have any pointers regarding Comment 3, or perhaps could forward this NI to someone else who might be able to help? Thanks!
Flags: needinfo?(snorp)
Flags: needinfo?(nchen)
Eduoard, I guess you can reproduce this? I believe a hang here indicates that we get stuck processing the Android message loop on the Gecko thread. It would be useful to get a java stack trace there. I think you can call call 'dump-java-stack' from within jimdb, or 'call dvmDumpAllThreads(true)'.
Flags: needinfo?(snorp) → needinfo?(eoger)
(Assignee)

Comment 6

a year ago
Looks to be a weird interaction between promises and the event loop. I'm working on a fix/workaround.
Flags: needinfo?(nchen)
(Assignee)

Comment 7

a year ago
Created attachment 8843415 [details] [diff] [review]
Notify observers from Gecko event queue (v1)

Call NotifyObservers from the Gecko event queue instead of the Android
event queue, to better support Promises in notified JS code.
Attachment #8843415 - Flags: review?(snorp)
(Assignee)

Comment 8

a year ago
Edouard, this patch work for me locally, if you want to test it.
Attachment #8843415 - Flags: review?(snorp) → review+
I have been testing this for almost an hour now, it seems this patch did the trick.
However, his bug is not reliably reproducible, so I'd probably wait on grisha to test that too before marking this bug as fixed.
Flags: needinfo?(eoger)
(Reporter)

Comment 10

a year ago
I'll test it today as well. If it's generally useful, and seems to improve things in our case, happy to have it on nightly and get softvision to help out.
If this works we should ask for uplift asap

Comment 12

a year ago
Pushed by nchen@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/06745caaa027
Notify observers from Gecko event queue; r=snorp
(Assignee)

Updated

a year ago
Assignee: nobody → nchen
Status: NEW → ASSIGNED
tracking-fennec: --- → ?
P1 so we have this on our dashboard
Priority: -- → P1
(Assignee)

Updated

a year ago
See Also: → bug 1344892

Comment 14

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/06745caaa027
Status: ASSIGNED → RESOLVED
Last Resolved: a year ago
status-firefox55: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → Firefox 55
(Assignee)

Comment 15

a year ago
Comment on attachment 8843415 [details] [diff] [review]
Notify observers from Gecko event queue (v1)

Approval Request Comment
[Feature/Bug causing the regression]: N/A
[User impact if declined]: Push messages (e.g. from Fx Account) can take a long time to be processed.
[Is this code covered by automated tests?]: No
[Has the fix been verified in Nightly?]: Yes
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: Slightly
[Why is the change risky/not risky?]: The patch is pretty simple and we're in a new release cycle.
[String changes made/needed]: None
Attachment #8843415 - Flags: approval-mozilla-beta?
Attachment #8843415 - Flags: approval-mozilla-aurora?
status-firefox54: --- → affected
Comment on attachment 8843415 [details] [diff] [review]
Notify observers from Gecko event queue (v1)

Fix a push message issue. Aurora54+.
Attachment #8843415 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Comment 17

a year ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-aurora/rev/a4588ec03ebe
status-firefox54: affected → fixed
tracking-fennec: ? → 54+

Updated

a year ago
status-firefox53: --- → affected
Comment on attachment 8843415 [details] [diff] [review]
Notify observers from Gecko event queue (v1)

This seems like a core scenario, let's fix this regression, Beta53+
Attachment #8843415 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment 19

a year ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-beta/rev/353c1a7da9be
status-firefox53: affected → fixed
This is breaking session restore on Aurora and Beta because on the Gecko side "Session:Restore" now arrives *before* any of the "Tab:Load" messages that normally happen beforehand.

Updated

a year ago
Depends on: 1346634
You need to log in before you can comment on or make changes to this bug.