Intermittent /svg/coordinate-systems/viewBox-synthesized-in-img-001.tentative.html | single tracking bug
Categories
(Core :: SVG, defect, P5)
Tracking
()
People
(Reporter: intermittent-bug-filer, Unassigned)
References
Details
(Keywords: intermittent-failure, intermittent-testcase)
Filed by: abutkovits [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=504037071&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Kw4ZGa3DSMGUryF08PiBSQ/runs/0/artifacts/public/logs/live_backing.log
Reftest URL: https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Kw4ZGa3DSMGUryF08PiBSQ/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1
[task 2025-04-15T11:12:24.525Z] 11:12:24 INFO - TEST-START | /svg/coordinate-systems/viewBox-synthesized-in-img-001.tentative.html
[task 2025-04-15T11:12:45.750Z] 11:12:45 INFO - TEST-UNEXPECTED-TIMEOUT | /svg/coordinate-systems/viewBox-synthesized-in-img-001.tentative.html | expected PASS
[task 2025-04-15T11:12:45.750Z] 11:12:45 INFO - TEST-INFO took 21225ms
[task 2025-04-15T11:12:45.750Z] 11:12:45 INFO - Restarting browser for new test group
[task 2025-04-15T11:12:46.816Z] 11:12:46 INFO - STDOUT: cleanup aborted: Unable to remount device
[task 2025-04-15T11:12:46.816Z] 11:12:46 INFO - Closing logging queue
[task 2025-04-15T11:12:46.816Z] 11:12:46 INFO - queue closed
[task 2025-04-15T11:12:46.831Z] 11:12:46 INFO - Setting android reftest preferences
[task 2025-04-15T11:12:46.831Z] 11:12:46 INFO - Setting up ssl
[task 2025-04-15T11:12:47.017Z] 11:12:47 INFO - certutil | b''
[task 2025-04-15T11:12:47.152Z] 11:12:47 INFO - certutil | b''
[task 2025-04-15T11:12:47.165Z] 11:12:47 INFO - certutil | b'\nCertificate Nickname Trust Attributes\n SSL,S/MIME,JAR/XPI\n\nweb-platform-tests CT,, \n'
[task 2025-04-15T11:12:47.863Z] 11:12:47 INFO - adb Granting important runtime permissions to org.mozilla.geckoview.test_runner
[task 2025-04-15T11:12:49.105Z] 11:12:49 INFO - adb launch_application: am start -W -n org.mozilla.geckoview.test_runner/org.mozilla.geckoview.test_runner.TestRunnerActivity -a android.intent.action.MAIN --es env0 MOZ_CRASHREPORTER=1 --es env1 MOZ_CRASHREPORTER_NO_REPORT=1 --es env2 MOZ_CRASHREPORTER_SHUTDOWN=1 --es env3 MOZ_HIDE_RESULTS_TABLE=1 --es env4 MOZ_IN_AUTOMATION=1 --es env5 MOZ_LOG=signaling:3,mtransport:4,DataChannel:4,jsep:4 --es env6 R_LOG_LEVEL=6 --es env7 R_LOG_DESTINATION=stderr --es env8 R_LOG_VERBOSE=1 --es env9 MOZ_PROCESS_LOG=/tmp/tmpggrjdmvepidlog --es env10 MINIDUMP_SAVE_PATH=/builds/worker/workspace/build/blobber_upload_dir --es env11 MOZ_DISABLE_NONLOCAL_CONNECTIONS=1 --es arg0 -no-remote --es arg1 -profile --es arg2 /data/local/tmp/test_root/profile --es arg3 --marionette --es arg4 --remote-allow-system-access --es arg5 about:blank
[task 2025-04-15T11:12:50.969Z] 11:12:50 INFO - Starting runner
[task 2025-04-15T11:12:51.934Z] 11:12:51 INFO - TEST-START | /svg/painting/color-interpolation-001.svg
Comment 1•10 days ago
|
||
This is a test I just added, FWIW.
Not sure why it would be timing out... It does run some scripts when loading, but not a lot. I'll take a look and see if anything jumps out at me...
Comment 2•9 days ago
•
|
||
It looks like the failure is intermittent, but the test is reliably slow here even when it doesn't take long enough to time out.
Here are two recent test runs where it passed but still took a long time (~14-17 seconds, whereas neighboring tests took ~500ms):
https://firefoxci.taskcluster-artifacts.net/c_VwMRdgTnKsrfWJfzjm8A/0/public/logs/live_backing.log
TEST-PASS | /svg/coordinate-systems/viewBox-synthesized-in-img-001.tentative.html | took 16446ms
https://firefoxci.taskcluster-artifacts.net/H7Pj-uqWToCYxUGzwCdh9g/0/public/logs/live_backing.log
TEST-PASS | /svg/coordinate-systems/viewBox-synthesized-in-img-001.tentative.html | took 14359ms
In the failure quoted in comment 0, the test took 21225ms (21 seconds), so I suspect the timeout threshold on this platform is 20 seconds - so even our "good" test runs are dangerously close to that threshold (i.e. the 14-17s "good" durations are quite close to 20s).
bug 1960779 should reduce some error logging, which might conceivably reduce overhead and test-duration here. If this is still too slow even after that, and it's still mysterious why things are slow, we could conceivably split the test into two pieces (each with a shorter duration), too.
Comment 3•8 days ago
•
|
||
I suspect this is taking so long due to the fact that the reference case is loading so many embed
elements -- 144 of them, to be exact -- and there's probably some overhead associated with doing that. I do see them appearing to load a bit slowly/incrementally in my local debug build, at least.
Also, it looks like this slowness isn't entirely Android-specific - here are some logs where this test takes between 4s-16s on linux debug, TSAN, and ASAN (whereas the previous test takes 200-500ms):
tsan: https://treeherder.mozilla.org/logviewer?job_id=504037856&repo=autoland&lineNumber=8757
asan: https://treeherder.mozilla.org/logviewer?job_id=504043567&repo=autoland&lineNumber=22373
debug: https://treeherder.mozilla.org/logviewer?job_id=504035101&repo=autoland&lineNumber=26421
I took a performance profile in my local debug build, and it looks like most of the time is spent setting up the embedded document, its JS global 'window' object, and stuff like that (for all of the 144 documents). That overhead is all not really necessary/interesting (and is a waste of compute time on our test runners - this test isn't worth burning 10-20 seconds on for each test run); so I'm going to simplify the testcase to just use specially crafted divs
to mock up the expected rendering. That version loads ~instantly.
Comment 4•7 days ago
|
||
(In reply to Daniel Holbert [:dholbert] from comment #3)
I suspect this is taking so long due to the fact that the reference case is loading so many
embed
elements -- 144 of them, to be exact -- and there's probably some overhead associated with doing that.
Bug 1961245 is fixing this and improves the load time by an order of magnitude or two per bug 1961245 comment 2.
I took a performance profile in my local debug build, and it looks like most of the time is spent setting up the embedded document
Following up on this, in case we want to dig more at this in the future....
I've captured performance profiles, taken in my debug build (to intentionally make things slower), comparing the testcase vs. the old version of the reference case (using <embed>
), loading each one 5 times (all "hard reloads" with Shift+Ctrl+R). I also modified both the testcase and the reference case to add a unique counter to each data URI, by adding the following around the makeDataURI
function, putting a different counter
attribute on the <svg> element in each data URI (this ensures that the testcase can't get any advantage from e.g. reusing the same VectorImage & underlying SVG document for cases where we're loading the same data URI in different <img>
size combinations):
let counter = 0;
function makeDataURI(width, height) {
[...]
// Add a counter attribute to make sure each URI is different:
uri += ` counter='${counter}'`;
counter++;
Profile of loading testcase 5 times: https://share.firefox.dev/4ipS71T
(marker chart shows DocumentLoad durations of ~340ms)
Profile of loading reference case 5 times: https://share.firefox.dev/3YEGGwj
(marker chart shows DocumentLoad durations of ~1.7s)
Comment 5•7 days ago
•
|
||
And here's a comparison of testcase/old-slow-reference-case in latest Nightly which is probably more important for actual performance characteristics (whereas my previous comment was capturing profiles in a debug build):
Profile of loading testcase 5 times: https://share.firefox.dev/4jBsNqT
(marker chart shows DocumentLoad durations of 113-160ms)
Profile of loading reference case 5 times: https://share.firefox.dev/3GqCz0A
(marker chart shows DocumentLoad durations of ~320-350ms)
In an opt build, this difference isn't super substantial, and our load time is on par with Chrome as well, so I'm not too worried about this needing further investigation right now.
I suspect the primary difference between the fast testcase and slow reference case is that we support progressive loading of <embed>-hosted documents, which means we've got various opportunities to yield and allow for style/layout/paint flushes, etc; and that yielding happens to slow things down a bit in this somewhat pathological 144-embed-elements scenario (particularly in a scenario like android/asan/tsan where everything takes longer and we perhaps yield more times); as compared to the testcase which has 144 images which IIRC force the image document to be fully parsed without any yielding-for-partial-rendering.
Comment hidden (Intermittent Failures Robot) |
Comment 7•4 days ago
|
||
In any case, it looks like the intermittent failures (timeouts) went away as of bug 1961245 landing. Closing as FIXED by that.
Description
•