Perma [Fission] TEST-UNEXPECTED-TIMEOUT | http://mochi.test:8888/tests/uriloader/exthandler/tests/mochitest/test_nullCharFile.xhtml | application timed out after 370 seconds with no output
Categories
(Firefox :: File Handling, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox81 | --- | unaffected |
firefox82 | --- | unaffected |
firefox83 | --- | unaffected |
firefox84 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: mak)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: intermittent-failure, regression, Whiteboard: [perf:apert:?])
Attachments
(1 file)
Filed by: rmaries [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=319145705&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/HMU7OshjTZ29px9o5UaBng/runs/0/artifacts/public/logs/live_backing.log
[task 2020-10-20T15:56:49.025Z] 15:56:49 INFO - TEST-START | http://mochi.test:8888/tests/uriloader/exthandler/tests/mochitest/test_nullCharFile.xhtml
[task 2020-10-20T16:13:16.493Z] 16:13:16 INFO - add_task | Entering test
[task 2020-10-20T16:13:16.493Z] 16:13:16 INFO - Buffered messages finished
[task 2020-10-20T16:13:16.494Z] 16:13:16 ERROR - TEST-UNEXPECTED-TIMEOUT | http://mochi.test:8888/tests/uriloader/exthandler/tests/mochitest/test_nullCharFile.xhtml | application timed out after 370 seconds with no output
[task 2020-10-20T16:13:16.494Z] 16:13:16 ERROR - Force-terminating active process(es).
[task 2020-10-20T16:13:16.494Z] 16:13:16 INFO - Determining child pids from psutil...
[task 2020-10-20T16:13:16.522Z] 16:13:16 INFO - [20632, 20659, 20672, 20580, 20556, 20763, 20703, 20791, 20729]
[task 2020-10-20T16:13:16.525Z] 16:13:16 INFO - ==> process 20412 launched child process 20432
[task 2020-10-20T16:13:16.527Z] 16:13:16 INFO - ==> process 20412 launched child process 20556
[task 2020-10-20T16:13:16.529Z] 16:13:16 INFO - ==> process 20412 launched child process 20580
[task 2020-10-20T16:13:16.530Z] 16:13:16 INFO - ==> process 20412 launched child process 20632
[task 2020-10-20T16:13:16.531Z] 16:13:16 INFO - ==> process 20412 launched child process 20659
[task 2020-10-20T16:13:16.533Z] 16:13:16 INFO - ==> process 20412 launched child process 20672
[task 2020-10-20T16:13:16.534Z] 16:13:16 INFO - ==> process 20412 launched child process 20703
[task 2020-10-20T16:13:16.535Z] 16:13:16 INFO - ==> process 20412 launched child process 20729
[task 2020-10-20T16:13:16.535Z] 16:13:16 INFO - ==> process 20412 launched child process 20763
[task 2020-10-20T16:13:16.535Z] 16:13:16 INFO - ==> process 20412 launched child process 20791
[task 2020-10-20T16:13:16.535Z] 16:13:16 INFO - Found child pids: set([20672, 20580, 20556, 20432, 20659, 20791, 20632, 20729, 20763, 20703])
[task 2020-10-20T16:13:16.536Z] 16:13:16 INFO - Failed to get child procs
[task 2020-10-20T16:13:16.536Z] 16:13:16 INFO - Killing process: 20672
[task 2020-10-20T16:13:16.537Z] 16:13:16 INFO - TEST-INFO | started process screentopng
[task 2020-10-20T16:13:16.689Z] 16:13:16 INFO - TEST-INFO | screentopng: exit 0```
![]() |
||
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 2•4 years ago
|
||
This is apparenly only Linux debug, and what I see in the log is:
###!!! ASSERTION: AsyncWait failed: 'Error', file /builds/worker/checkouts/gecko/netwerk/base/nsInputStreamPump.cpp:128
WARNING: Blob data was not retrieved!: file /builds/worker/checkouts/gecko/dom/file/uri/BlobURLInputStream.cpp:437
#01: nsInputStreamPump::Resume() [netwerk/base/nsInputStreamPump.cpp:243]
#02: mozilla::net::DocumentLoadListener::FinishReplacementChannelSetup(nsresult) [netwerk/ipc/DocumentLoadListener.cpp:1078]
#03: mozilla::net::DocumentLoadListener::RedirectToRealChannelFinished(nsresult) [netwerk/ipc/DocumentLoadListener.cpp:1022]
...
I'm not sure what's up, I'll try reproducing on a Linux VM.
Assignee | ||
Comment 3•4 years ago
|
||
Assignee | ||
Comment 4•4 years ago
•
|
||
My patch changes the Blob to a data uri, that for this test doesn't really matter, though we would not want to hide possible bugs here. I don't know much about why a Blob in x-orig would cause the network code to bail out, my theory is related to the Blob's origin, but I don't know the network and x-orig details well enough to go further.
Do you want me to file a follow-up bug to investigate this, or is it expected?
Assignee | ||
Comment 6•4 years ago
|
||
Yes, we can investigate the blob thing apart, it is not necessary for the test itself.
Comment 8•4 years ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Comment 10•4 years ago
|
||
== Change summary for alert #27538 (as of Fri, 06 Nov 2020 07:47:05 GMT) ==
Improvements:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
9% | Similarity | linux64-shippable-qr | cold webrender | 0.86 -> 0.94 | |
8% | Similarity2D | linux64-shippable-qr | cold webrender | 0.92 -> 0.99 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=27538
Comment 11•4 years ago
|
||
The problem here is that when running in xorigin mode, the parent frame and the blob URI have different agent cluster IDs. And, since the load is happening in the frame, we compare the agent cluster ID of the parent frame to the one of the blob here and bail out.
I think that, for the purpose of this test, that doesn't matter, so the change is fine. But it also seems wrong. I'd expect an <a download>
with a blob created by the page itself to work, regardless of the agent cluster of the parent frame. So that probably deserves a separate bug. Can you file one, please?
Comment 12•4 years ago
|
||
(In reply to Alexandru Ionescu (needinfo me) [:alexandrui] from comment #10)
== Change summary for alert #27538 (as of Fri, 06 Nov 2020 07:47:05 GMT) ==
Improvements:
Ratio Suite Test Platform Options Absolute values (old vs new) 9% Similarity linux64-shippable-qr cold webrender 0.86 -> 0.94 8% Similarity2D linux64-shippable-qr cold webrender 0.92 -> 0.99 For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=27538
There's absolutely no way this test-only patch could have affected those tests. Can you please remove the reference to this bug from the alert?
Comment 13•4 years ago
|
||
(In reply to Kris Maglione [:kmag] from comment #12)
There's absolutely no way this test-only patch could have affected those tests. Can you please remove the reference to this bug from the alert?
The graph is pretty clear that this is the bug that caused the improvements. Could be indirectly. I did some retriggers to confirm. I will link the alert to another bug as soon as I find the cause.
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
![]() |
||
Comment 14•4 years ago
|
||
The pushlog linked for the alert contains bug 1672597 (changes to image decoding) and the improvement is for Pinterest, an image heavy site. The improvement should be from that change (the change in this bug doesn't change the application, only a non-performance test).
Comment 15•4 years ago
|
||
Thanks, did some more retriggers.
Assignee | ||
Comment 16•4 years ago
|
||
I filed bug 1676506 for the Blob problem, thanks.
Comment 17•4 years ago
|
||
(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #14)
The pushlog linked for the alert contains bug 1672597 (changes to image decoding) and the improvement is for Pinterest, an image heavy site. The improvement should be from that change (the change in this bug doesn't change the application, only a non-performance test).
The retriggers (see where's 9cbcab210 related to Alert 27538 - e301b90c2b0e) didn't reveal anything new. There's the possibility that this is just a spike and the cause to be a bit later, but not earlier. Did some more retriggers.
Updated•4 years ago
|
Comment 18•4 years ago
|
||
I'm sorry, :aryx, the retriggers didn't reveant anything new.
![]() |
||
Comment 19•4 years ago
|
||
If one compares the first pinterest-vismet (old) and second one (newer), there is this difference:
old
[task 2020-10-27T19:48:08.914Z] INFO - Comparing videos to TASK_GROUP=OVoOn1h8Q7yuhB2B0QYAGg, TASK_ID=NeqmmcjiRHuaJDSsEGwqTQ
[task 2020-10-27T19:48:08.914Z] INFO - Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/NeqmmcjiRHuaJDSsEGwqTQ/artifacts/public/test_info/browsertime-results.tgz
[task 2020-10-27T19:48:09.698Z] INFO - Found 50 old videos
[task 2020-10-27T20:08:41.000Z] INFO - Average 3D similarity: 0.93241
[task 2020-10-27T20:08:41.000Z] INFO - Average 2D similarity: 0.99208
new
[task 2020-11-11T11:07:50.328Z] INFO - Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/P1dyBzJQSGunIMR5JvauPg/artifacts/public/test_info/browsertime-results.tgz
[task 2020-11-11T11:07:50.570Z] INFO - Found 50 old videos
[task 2020-11-11T11:29:53.070Z] INFO - Average 3D similarity: 0.8469
[task 2020-11-11T11:29:53.070Z] INFO - Average 2D similarity: 0.91059
Please check if the baseline (the different task ids) has a shift in performance metrics:
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Comment 20•4 years ago
|
||
I've marked the improvement alert as invalid as the change was a simularity metric, which we shouldn't be alerting on. It also sounds like this wasn't the correct cause of the change. Alex: please don't clear needinfos without leaving a comment.
Description
•