Only a single instance of non-content child processes can exist on Android, breaking tests which use multiple gecko runtimes
Categories
(GeckoView :: General, defect, P3)
Tracking
(Not tracked)
People
(Reporter: robwu, Unassigned)
References
Details
When I run a xpcshell test on Android, I get tons of logspam that is a repetition of the following two lines:
D ServiceAllocator: org.mozilla.gecko.process.GeckoChildProcessServices$gpu updateBindings: FOREGROUND priority, 0 importance, 3 successful binds, 0 failed binds, 0 successful unbinds
W ServiceChildProcess: This process belongs to a different GeckoRuntime owner: a6e047f4-b59e-44e6-8fe7-2e77dfa27779 process: b4d03bfd-e47c-4c35-80d3-687a1b78ffce
Here is an example of a 350 MB logcat-emulator-5554.log from a try push:
- 1 963 881 lines
- 1 527 627 lines with the above two messages.
To get anything useful out of adb logcat locally, I currently have to filter logcat as follows:
adb logcat | grep -vE 'ServiceChildProcess: This process belongs to a different GeckoRuntime owner|ServiceAllocator: org.mozilla.gecko.process.GeckoChildProcessServices'
If you want to reproduce, create a .mozconfig to develop for Android (artifact builds are OK):
./mach android-emulator
./mach test toolkit/components/extensions/test/xpcshell/test_ext_alarms_does_not_fire.js
and in another terminal, adb logcat :
07-21 19:14:47.812 5136 5152 I Gecko : {"action":"log","time":1689959687807,"thread":null,"pid":null,"source":"xpcshell/head.js","level":"INFO","message":"xpcshell-remote.ini:toolkit/components/extensions/test/xpcshell/test_ext_alarms_does_not_fire.js | Starting test_cleared_alarm_does_not_fire"}
07-21 19:14:47.813 5136 5152 I Gecko :
07-21 19:14:47.813 5136 5152 I Gecko : {"action":"log","time":1689959687813,"thread":null,"pid":null,"source":"xpcshell/head.js","level":"INFO","message":"(xpcshell/head.js) | test test_cleared_alarm_does_not_fire pending (2)"}
07-21 19:14:47.848 5136 5152 I Gecko :
07-21 19:14:47.848 5136 5152 I Gecko : {"action":"log","time":1689959687848,"thread":null,"pid":null,"source":"xpcshell/head.js","level":"INFO","message":"\"Extension attached\""}
07-21 19:14:47.848 5136 5152 I Gecko :
07-21 19:14:47.848 5136 5152 I Gecko : {"action":"log","time":1689959687848,"thread":null,"pid":null,"source":"xpcshell/head.js","level":"INFO","message":"(xpcshell/head.js) | test run_next_test 1 finished (2)"}
07-21 19:14:47.962 5136 5152 D : HostConnection::get() New Host Connection established 0x7bdbf081f6a0, tid 5152
07-21 19:14:47.967 5136 5152 D EGL_emulation: eglCreateContext: 0x7bdbf08a7440: maj 3 min 0 rcv 3
07-21 19:14:47.968 5136 5152 D EGL_emulation: eglMakeCurrent: 0x7bdbf08a7440: ver 3 0 (tinfo 0x7bdbf08ba6e0)
07-21 19:14:47.972 5136 5152 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008cdf
07-21 19:14:47.972 5136 5152 E eglCodecCommon: glUtilsParamSize: unknow param 0x00008824
07-21 19:14:47.989 5136 5168 D ServiceAllocator: org.mozilla.gecko.process.GeckoChildProcessServices$gpu updateBindings: FOREGROUND priority, 0 importance, 3 successful binds, 0 failed binds, 0 successful unbinds
07-21 19:14:47.997 5053 5070 W ServiceChildProcess: This process belongs to a different GeckoRuntime owner: a5be9210-1374-4549-bb9f-86271cd1137b process: dcc7ad4b-a327-40fe-8490-5baa279e0cda
07-21 19:14:48.011 5136 5168 D ServiceAllocator: org.mozilla.gecko.process.GeckoChildProcessServices$gpu updateBindings: FOREGROUND priority, 0 importance, 3 successful binds, 0 failed binds, 0 successful unbinds
07-21 19:14:48.013 5053 5070 W ServiceChildProcess: This process belongs to a different GeckoRuntime owner: a5be9210-1374-4549-bb9f-86271cd1137b process: dcc7ad4b-a327-40fe-8490-5baa279e0cda
The last two lines repeat 1237 times in this one test run, interleaved with other usual log output. Out of the 3.2k log lines, 2.5k was from this logspam...
| Reporter | ||
Updated•2 years ago
|
| Reporter | ||
Comment 1•2 years ago
|
||
Jamie - the logging was introduced in bug 1741156. Do you know what's going on here?
Besides being a pain during development, this also results in gigabytes, if not terabytes of storage space consumption in CI.
Comment 2•2 years ago
•
|
||
I think what's happening here is that in xpcshell tests we create many GeckoRuntimes so that we can run multiple tests in parallel. Each of these runtimes attempts to launch their own GPU process but android only allows one instance of each service, in this case org.mozilla.gecko.process.GeckoChildProcessServices$gpu. So after the first runtime successfully launches the GPU process, each subsequent runtime attempting to do so will instead notify the existing instance we attempted to start it. The existing instance will see that it belongs to a different runtime, and log this warning, and we will repeatedly retry launching it.
This logging existed before bug 1741156, back when content processes were the only type of child process. While we only declare one gpu process service (and one socket, one utility, etc), we declare many content process ones, eg tab1, tab2, etc. If gecko attempts to launch a content process and picks a service already in use then we log this warning and retry with a random tabN until we find one which succeeds. However, with the GPU process (and socket process, utility process, etc) this will never succeed as we've only declared one service for the process type. Note that for the content processes we can encounter this warning during normal operation because multiple content processes are allowed, whereas for GPU and other process types we can only encounter it during tests due to having multiple GeckoRuntimes.
I'm unsure what the consequences of this are. Perhaps we can simply remove the logging. Or perhaps this is an actual problem. We only support multiple GeckoRuntimes during tests so it's not an issue in the wild, but perhaps it might causes issues during tests. Maybe we need to declare multiple services for each type of child process and use the same logic as for content processes to select a random service name during launch. (or perhaps even a generic childN services could be used for both content and non-content child processes.) Kayacan, perhaps you or somebody else on the geckoview team might be better placed to answer that?
| Reporter | ||
Comment 3•2 years ago
|
||
This logging existed before bug 1741156,
Bug 1696460 added the original logging in the allow-multiple-runtimes patch, specifically https://hg.mozilla.org/mozilla-central/rev/9c4199568bbc4f6f1a7c439149c744156892c788
If gecko attempts to launch a content process and picks a service already in use then we log this warning and retry with a random
tabNuntil we find one which succeeds. However, with the GPU process (and socket process, utility process, etc) this will never succeed as we've only declared one service for the process type.
So that would require revisiting the logic here: https://searchfox.org/mozilla-central/rev/385f408040a8bb5fd9d168fb16a4e9071a13938d/mobile/android/geckoview/src/main/java/org/mozilla/gecko/process/GeckoProcessManager.java#822-828
Comment 4•2 years ago
|
||
Hey Rob, agreed! And thanks Jaime for the great details there. We need to revisit the logic there to get rid of those spamming logs. There should not be multiple instances of the GeckoRuntime except the tests, as Jaime said. Removing the logs or, better, refactoring the logic of retrying to start the child service process would solve this issue. I will give it P3 and S3 at the moment to keep it in the backlog for the upcoming sprints. Worth to note that Fission is being implemented on Android at the moment, so the child service process management logic will very possibly be changing. So this maybe addressed there as well. There's no ticket/plan for that at the moment, that i know of. When a ticket is created for the process management on GeckoView layer for the Fission project, I'll link it here.
| Reporter | ||
Comment 5•1 year ago
|
||
Is there any chance of revisiting this bug? I keep having to come back to this bug whenever I need to look at logcat output. E.g. given the logcat artifact of a try push, I have to run the following to trim the log file to something readable.
cat logcat-emulator-5554.log | grep -vE 'ServiceChildProcess: This process belongs to a different GeckoRuntime owner|ServiceAllocator: org.mozilla.gecko.process.GeckoChildProcessServices' > logcat.log
Comment 6•1 year ago
|
||
We could perhaps temporarily disable the GPU process during xpcshell tests.. Not ideal to be running them in a different config than we want to ship, but OTOH I think under the current system all but the first attempted GPU process launch will fail anyway, hence the logspam. So we're not exactly testing the GPU process as it stands.
If we do go down this path I'd definitely like it to be a temporary measure, we should really strive to get xpcshell tests running with a working GPU process
Updated•1 month ago
|
| Reporter | ||
Comment 7•1 month ago
|
||
Is this work going to be scheduled soon?
The immediate impact of logspam is the need to filter log files before it is readable. I find myself searching&linking for this bug relatively often in comments on bugs/patches/chats where there is a need to analyze a log file, using keyword "logcat".
If the root cause is difficult to address, an acceptable resolution to this bug is to disable the logspam, in which case the fix of the root cause can be a separate bug.
Comment 8•1 month ago
|
||
I'm working on bug 1995541. This will avoid the logcat spam, but not address the underlying issue. Affected tests will end up being run without a GPU process (but they already are, just after a 5 second wait and possibly being killed in the meantime). We do still need to fix the underlying issue as running these tests in a non-default configuration (i.e. without a GPU and other child processes) is not ideal, hopefully the geckoview team can find time to address that at some point
Description
•