Testing on Android: Delegate test runner to run on Android
Categories
(Core :: JavaScript Engine, enhancement, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox88 | --- | fixed |
People
(Reporter: nbp, Assigned: nbp)
References
Details
Attachments
(2 files)
Lately, we landed some optimization to execute the test suite faster by sharing the XDR version of the self-hosted code. This optimization had little to no impact on Android test suite execution.
One hypothesis is that the ADB round-trip for spawning each test is too costly, and made the improvement of the self-hosted code neglectable compared to the ADB overhead.
While this bug has the potential for improving the test suite, it is also an interesting data-point on whether we should pursue on-disk caching on the self-hosted code as XDR or not.
Assignee | ||
Comment 1•4 years ago
|
||
First shot at this, suggest that we might be able to save ~24 minutes per android JIT job!
Which would be roughly 24h (= 24min * 10 * 6) in total per full-test.
Assignee | ||
Comment 2•4 years ago
|
||
Assignee | ||
Comment 3•4 years ago
|
||
Assignee | ||
Comment 4•4 years ago
|
||
Ok, after applying these changes, I see that random tests which are supposed to return immediately, even skipped tests are doing timeout.
I did not see any of these before, and I wonder if this could be some form of shutdown hang in the JS Shell.
I will instrument to dump the stack on SIGTERM.
Comment 5•4 years ago
|
||
FWIW, I've experienced adb just terminating apparently at random when running tests locally.
Assignee | ||
Comment 6•4 years ago
|
||
(In reply to Lars T Hansen [:lth] from comment #5)
FWIW, I've experienced adb just terminating apparently at random when running tests locally.
With the patch all the tests are run by a single shell script which is executed with a single adb command.
The JS shell is wrapped by a timeout
command provided by busybox.
Honestly, I am tempted to land the changes, even if these timeout issue exist, as this would be a huge time saving even if we have to restart a few test failure caused by these timeout.
Assignee | ||
Comment 7•4 years ago
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #4)
I see that random tests which are supposed to return immediately, even skipped tests are doing timeout.
I managed to instrument a JS Shell in such a way that it prints the stack¹, one of the failure reported the following stack, unfortunately lacking any JS shell symbols:
#01: ???[/data/local/tmp/test_root/bin/js +0x5e904]
#02: ???[/system/lib/libc.so +0x18aa8]
#03: syscall[/system/lib/libc.so +0x18e08]
#04: pthread_join[/system/lib/libc.so +0x47cf0]
Which seems to indicate that one of the Helper thread did not join back after the completion of the JS Shell.
¹ I learnt that inside mozglue
with has MozStack*
functions which are using various backend implementation and are able to recover this information.
Assignee | ||
Comment 8•4 years ago
•
|
||
Doing extra printf debugging, suggests that this could be an issue within the code which is joining the Helper threads.
However, I do not understand:
- Why aren't the processes killed after receiving a SIGTERM, which is supposed to be forwarded.
- Why the hang seems to get resolved as soon as the signal is received.
- What would be the issue in the
HelperThread
waiting system.
This try link show different tests failing, where the stack is printed when a SIGTERM is received and where fprintf
are added next to all pthread_join
calls.
The output seems to suggest that the we start joining with the HelperThread
, but suddently fail at joining more threads, and wait until the signal handler is called, to resume joining.
Assignee | ||
Comment 9•4 years ago
|
||
The following try link suggest that the failure is not restricted to HelperThread
but can also happen with WorkerThread
.
After some discussion with the team, it was decided that the best alternative, given the low likelyhood and the randomness of the failure would be to re-run the tests which are failing due to a timeout, and only report the last execution.
This would reduce other timeout but would most likely mute pthread_join
hanging issues.
Comment 10•4 years ago
|
||
Comment 11•4 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/6ce356a95473
https://hg.mozilla.org/mozilla-central/rev/e400022129be
Description
•