`RECEIVE_BOOT_COMPLETED` is not always available making the `WorkManager`/`JobScheduler` work unreliably
Categories
(Data Platform and Tools :: Glean: SDK, task, P3)
Tracking
(Not tracked)
People
(Reporter: Dexter, Unassigned)
References
Details
(Whiteboard: [telemetry:glean-rs:backlog])
From a conversation with Sebastian:
(Firefox Reality) One of their devices/stores doesn't allow apps with the RECEIVE_BOOT_COMPLETED which is used by workmanager and jobscheduler to persist background jobs.
Apparently, when using service-telemetry, on these devices, (with JobScheduler) they still see a large drop in telemetry pings. Even though telemetry itself is persisted.
This suggests that the RECEIVE_BOOT_COMPLETED might be used to persist jobs across restarts. We should make sure to account for that in Glean and/or verify that the WorkManager behaves consistently.
| Reporter | ||
Updated•6 years ago
|
Updated•6 years ago
|
| Reporter | ||
Comment 1•6 years ago
|
||
Note: this permission is automatically added to the app manifest, see the google issue.
| Reporter | ||
Comment 2•6 years ago
|
||
From https://issuetracker.google.com/issues/125753105, it suggests something like "You can always use manifest mergers and tools:node=remove to remove the components & permissions you don't want from your AndroidManifest.xml"
We could test if we still receive telemetry after suppressing that permission in the glean-sample app.
Moving this back to incoming, it's blocking FirefoxReality.
Updated•6 years ago
|
| Reporter | ||
Comment 3•6 years ago
|
||
I asked :daoshengmu via their FAQ doc if they're able to test if comment 2 works for them.
Comment 4•6 years ago
•
|
||
(In reply to Alessio Placitelli [:Dexter] from comment #3)
I asked :daoshengmu via their FAQ doc if they're able to test if comment 2 works for them.
We did it actually [1], and we have to override TelemetryScheduler to make jobs can be scheduled [2]. I am not sure if Glean will need us to do this. This was difficult to notice our telemetry didn't upload in local build, because the job scheduler still worked properly at some devices after removing RECEIVE_BOOT_COMPLETED. We found this telemetry missing after we deploy to the store. After done [2], the telemetry are back.
I will try to use Glean test API to figure it out as well.
[1] https://github.com/MozillaReality/FirefoxReality/pull/1359
[2] https://github.com/MozillaReality/FirefoxReality/blob/67bc4711974bb427c4b42dedf84f79705a887cd9/app/src/common/shared/org/mozilla/vrbrowser/telemetry/FxRTelemetryScheduler.java#L34
| Reporter | ||
Comment 5•6 years ago
|
||
(In reply to Daosheng Mu[:daoshengmu] from comment #4)
(In reply to Alessio Placitelli [:Dexter] from comment #3)
I asked :daoshengmu via their FAQ doc if they're able to test if comment 2 works for them.
We did it actually [1], and we have to override
TelemetrySchedulerto make jobs can be scheduled [2]. I am not sure if Glean will need us to do this.
I don't think Glean will require that, but we need to test. The WorkManager implementation does intentionally call builder.setPersisted(false);, as they use other modules for intercepting the boot complete message and rescheduling jobs.
In our case, we don't rely on restoring scheduled jobs.
This was difficult to notice our telemetry didn't upload in local build, because the job scheduler still worked properly at some devices after removing
RECEIVE_BOOT_COMPLETED. We found this telemetry missing after we deploy to the store. After done [2], the telemetry are back.
In this case, this should be fairly easy :) While you have both service-telemetry and the Glean SDK in your application, we could compare the volume of incoming pings and see if there's a drastic mismatch. If there is, then we can investigate a bit further.
I will try to use Glean test API to figure it as well.
Do you mean the Glean Debug View? If that's what you meant, yes, it could give us an idea if we're able to send things at all.
Comment 6•6 years ago
|
||
Closing this as INVALID since Glean no longer relies on the PeriodicWorkRequest which was what was breaking when we didn't see the RECEIVE_BOOT_COMPLETE flag.
We still use OneTimeWorkRequest for uploading pings, but this doesn't rely on the RECEIVE_BOOT_COMPLETE flag since it doesn't persist the job across restarts and thus is unaffected by this issue.
| Reporter | ||
Comment 7•6 years ago
|
||
(In reply to Travis Long [:travis_] from comment #6)
We still use
OneTimeWorkRequestfor uploading pings, but this doesn't rely on theRECEIVE_BOOT_COMPLETEflag since it doesn't persist the job across restarts and thus is unaffected by this issue.
This is the reason why this bug is still open: legacy telemetry did not rely on persistence across reboots either, but they were loosing pings. We need to validate this in the Firefox reality validation.
Comment 8•6 years ago
•
|
||
(In reply to Alessio Placitelli [:Dexter] from comment #7)
(In reply to Travis Long [:travis_] from comment #6)
We still use
OneTimeWorkRequestfor uploading pings, but this doesn't rely on theRECEIVE_BOOT_COMPLETEflag since it doesn't persist the job across restarts and thus is unaffected by this issue.This is the reason why this bug is still open: legacy telemetry did not rely on persistence across reboots either, but they were loosing pings. We need to validate this in the Firefox reality validation.
I respectfully disagree that the validation is necessary to close this bug. Let me try and better explain what I found while looking into this again: Legacy telemetry doesn't use WorkManager, but JobScheduler (which I know that WorkManager uses under the hood), but the RECEIVE_BOOT_COMPLETE flag is only required if the underlying JobScheduler is persistent. Legacy telemetry sets this persistent flag which is why it requires the RECEIVE_BOOT_COMPLETE flag and why legacy telemetry is subject to the bug. Since we no longer use PeriodicWorkRequest, WorkManager doesn't set this persistent flag within JobScheduler when it invokes it under the hood, and thus, the RECEIVE_BOOT_COMPLETE bug should not affect our upload tasks since it only affects re-starting a persisted job. Back when we still used the PeriodicWorkRequest for the MetricsPingScheduler, we were absolutely affected by this bug, but only as it related to that particular WorkRequest that was persisted by the OS.
I'm happy to keep this open if you feel the validation is needed to answer this, but I wanted to point out that the underlying conditions that made us subject to this bug in the first place have changed since this bug was filed.
| Reporter | ||
Comment 9•6 years ago
|
||
(In reply to Travis Long [:travis_] from comment #8)
I'm happy to keep this open if you feel the validation is needed to answer this, but I wanted to point out that the underlying conditions that made us subject to this bug in the first place have changed since this bug was filed.
Please see comment 5: I know we should not hit this, due to the difference in the implementation. I strongly believe we need to keep this open and validate that this works properly, closing it only if we can guarantee we're immune to the problem :) Given how weird Android works sometimes, I'd rather err on the side of caution on this!
Comment 10•6 years ago
|
||
(In reply to Alessio Placitelli [:Dexter] from comment #9)
(In reply to Travis Long [:travis_] from comment #8)
I'm happy to keep this open if you feel the validation is needed to answer this, but I wanted to point out that the underlying conditions that made us subject to this bug in the first place have changed since this bug was filed.
Please see comment 5: I know we should not hit this, due to the difference in the implementation. I strongly believe we need to keep this open and validate that this works properly, closing it only if we can guarantee we're immune to the problem :) Given how weird Android works sometimes, I'd rather err on the side of caution on this!
I have no problem with err'ing on the side of caution since I couldn't find androidx.jetpack source code to try and confirm what WorkManager does under the hood.
How about instead of closing, I make this depend on the FF-Reality validation so that we can come back and close it if everything checks out.
| Reporter | ||
Comment 11•6 years ago
|
||
(In reply to Travis Long [:travis_] from comment #10)
(In reply to Alessio Placitelli [:Dexter] from comment #9)
(In reply to Travis Long [:travis_] from comment #8)
How about instead of closing, I make this depend on the FF-Reality validation so that we can come back and close it if everything checks out.
+1!
Comment 12•5 years ago
|
||
Bug 1614687 is closed, so closing this.
Description
•