Closed Bug 1672238 Opened 1 year ago Closed 1 year ago

Perma [Fission] TEST-UNEXPECTED-TIMEOUT | http://mochi.test:8888/tests/uriloader/exthandler/tests/mochitest/test_nullCharFile.xhtml | application timed out after 370 seconds with no output

Categories

(Firefox :: File Handling, defect, P2)

defect
Points:
1

Tracking

()

RESOLVED FIXED
84 Branch
Iteration:
84.1 - Oct 19 - Nov 01
Tracking Status
firefox-esr78 --- unaffected
firefox81 --- unaffected
firefox82 --- unaffected
firefox83 --- unaffected
firefox84 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: mak)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: intermittent-failure, Whiteboard: [perf:apert:?])

Attachments

(1 file)

Filed by: rmaries [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=319145705&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/HMU7OshjTZ29px9o5UaBng/runs/0/artifacts/public/logs/live_backing.log


[task 2020-10-20T15:56:49.025Z] 15:56:49     INFO - TEST-START | http://mochi.test:8888/tests/uriloader/exthandler/tests/mochitest/test_nullCharFile.xhtml
[task 2020-10-20T16:13:16.493Z] 16:13:16     INFO - add_task | Entering test 
[task 2020-10-20T16:13:16.493Z] 16:13:16     INFO - Buffered messages finished
[task 2020-10-20T16:13:16.494Z] 16:13:16    ERROR - TEST-UNEXPECTED-TIMEOUT | http://mochi.test:8888/tests/uriloader/exthandler/tests/mochitest/test_nullCharFile.xhtml | application timed out after 370 seconds with no output
[task 2020-10-20T16:13:16.494Z] 16:13:16    ERROR - Force-terminating active process(es).
[task 2020-10-20T16:13:16.494Z] 16:13:16     INFO - Determining child pids from psutil...
[task 2020-10-20T16:13:16.522Z] 16:13:16     INFO - [20632, 20659, 20672, 20580, 20556, 20763, 20703, 20791, 20729]
[task 2020-10-20T16:13:16.525Z] 16:13:16     INFO - ==> process 20412 launched child process 20432
[task 2020-10-20T16:13:16.527Z] 16:13:16     INFO - ==> process 20412 launched child process 20556
[task 2020-10-20T16:13:16.529Z] 16:13:16     INFO - ==> process 20412 launched child process 20580
[task 2020-10-20T16:13:16.530Z] 16:13:16     INFO - ==> process 20412 launched child process 20632
[task 2020-10-20T16:13:16.531Z] 16:13:16     INFO - ==> process 20412 launched child process 20659
[task 2020-10-20T16:13:16.533Z] 16:13:16     INFO - ==> process 20412 launched child process 20672
[task 2020-10-20T16:13:16.534Z] 16:13:16     INFO - ==> process 20412 launched child process 20703
[task 2020-10-20T16:13:16.535Z] 16:13:16     INFO - ==> process 20412 launched child process 20729
[task 2020-10-20T16:13:16.535Z] 16:13:16     INFO - ==> process 20412 launched child process 20763
[task 2020-10-20T16:13:16.535Z] 16:13:16     INFO - ==> process 20412 launched child process 20791
[task 2020-10-20T16:13:16.535Z] 16:13:16     INFO - Found child pids: set([20672, 20580, 20556, 20432, 20659, 20791, 20632, 20729, 20763, 20703])
[task 2020-10-20T16:13:16.536Z] 16:13:16     INFO - Failed to get child procs
[task 2020-10-20T16:13:16.536Z] 16:13:16     INFO - Killing process: 20672
[task 2020-10-20T16:13:16.537Z] 16:13:16     INFO - TEST-INFO | started process screentopng
[task 2020-10-20T16:13:16.689Z] 16:13:16     INFO - TEST-INFO | screentopng: exit 0```
Flags: needinfo?(mak)
Regressed by: CVE-2020-15658
Summary: Perma tier 2 TEST-UNEXPECTED-TIMEOUT | http://mochi.test:8888/tests/uriloader/exthandler/tests/mochitest/test_nullCharFile.xhtml | application timed out after 370 seconds with no output → Perma [Fission] TEST-UNEXPECTED-TIMEOUT | http://mochi.test:8888/tests/uriloader/exthandler/tests/mochitest/test_nullCharFile.xhtml | application timed out after 370 seconds with no output
Assignee: nobody → mak
Status: NEW → ASSIGNED
Iteration: --- → 84.1 - Oct 19 - Nov 01
Points: --- → 1
Flags: needinfo?(mak)
Priority: P5 → P2

This is apparenly only Linux debug, and what I see in the log is:

###!!! ASSERTION: AsyncWait failed: 'Error', file /builds/worker/checkouts/gecko/netwerk/base/nsInputStreamPump.cpp:128
WARNING: Blob data was not retrieved!: file /builds/worker/checkouts/gecko/dom/file/uri/BlobURLInputStream.cpp:437
#01: nsInputStreamPump::Resume() [netwerk/base/nsInputStreamPump.cpp:243]
#02: mozilla::net::DocumentLoadListener::FinishReplacementChannelSetup(nsresult) [netwerk/ipc/DocumentLoadListener.cpp:1078]
#03: mozilla::net::DocumentLoadListener::RedirectToRealChannelFinished(nsresult) [netwerk/ipc/DocumentLoadListener.cpp:1022]
...

I'm not sure what's up, I'll try reproducing on a Linux VM.

Blocks: 1652556

My patch changes the Blob to a data uri, that for this test doesn't really matter, though we would not want to hide possible bugs here. I don't know much about why a Blob in x-orig would cause the network code to bail out, my theory is related to the Blob's origin, but I don't know the network and x-orig details well enough to go further.
Do you want me to file a follow-up bug to investigate this, or is it expected?

Flags: needinfo?(kmaglione+bmo)

Marco, is your patch ok to be landed?

Flags: needinfo?(mak)

Yes, we can investigate the blob thing apart, it is not necessary for the test itself.

Flags: needinfo?(mak)
Pushed by mak77@bonardo.net:
https://hg.mozilla.org/integration/autoland/rev/e301b90c2b0e
Fix test_nullCharFile.xhtml on fission x-origin. r=Gijs
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 84 Branch

== Change summary for alert #27538 (as of Fri, 06 Nov 2020 07:47:05 GMT) ==

Improvements:

Ratio Suite Test Platform Options Absolute values (old vs new)
9% pinterest Similarity linux64-shippable-qr cold webrender 0.86 -> 0.94
8% pinterest Similarity2D linux64-shippable-qr cold webrender 0.92 -> 0.99

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=27538

The problem here is that when running in xorigin mode, the parent frame and the blob URI have different agent cluster IDs. And, since the load is happening in the frame, we compare the agent cluster ID of the parent frame to the one of the blob here and bail out.

I think that, for the purpose of this test, that doesn't matter, so the change is fine. But it also seems wrong. I'd expect an <a download> with a blob created by the page itself to work, regardless of the agent cluster of the parent frame. So that probably deserves a separate bug. Can you file one, please?

Flags: needinfo?(kmaglione+bmo) → needinfo?(mak)

(In reply to Alexandru Ionescu (needinfo me) [:alexandrui] from comment #10)

== Change summary for alert #27538 (as of Fri, 06 Nov 2020 07:47:05 GMT) ==

Improvements:

Ratio Suite Test Platform Options Absolute values (old vs new)
9% pinterest Similarity linux64-shippable-qr cold webrender 0.86 -> 0.94
8% pinterest Similarity2D linux64-shippable-qr cold webrender 0.92 -> 0.99

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=27538

There's absolutely no way this test-only patch could have affected those tests. Can you please remove the reference to this bug from the alert?

Flags: needinfo?(aionescu)
Blocks: 1676506

(In reply to Kris Maglione [:kmag] from comment #12)

There's absolutely no way this test-only patch could have affected those tests. Can you please remove the reference to this bug from the alert?

The graph is pretty clear that this is the bug that caused the improvements. Could be indirectly. I did some retriggers to confirm. I will link the alert to another bug as soon as I find the cause.

Flags: needinfo?(aionescu)
Flags: needinfo?(aionescu)
Flags: needinfo?(aionescu)
Whiteboard: [perf:apert:?]
Flags: needinfo?(aionescu)

The pushlog linked for the alert contains bug 1672597 (changes to image decoding) and the improvement is for Pinterest, an image heavy site. The improvement should be from that change (the change in this bug doesn't change the application, only a non-performance test).

Thanks, did some more retriggers.

I filed bug 1676506 for the Blob problem, thanks.

Flags: needinfo?(mak)

(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #14)

The pushlog linked for the alert contains bug 1672597 (changes to image decoding) and the improvement is for Pinterest, an image heavy site. The improvement should be from that change (the change in this bug doesn't change the application, only a non-performance test).

The retriggers (see where's 9cbcab210 related to Alert 27538 - e301b90c2b0e) didn't reveal anything new. There's the possibility that this is just a spike and the cause to be a bit later, but not earlier. Did some more retriggers.

Flags: needinfo?(aionescu)
Flags: needinfo?(aionescu)

I'm sorry, :aryx, the retriggers didn't reveant anything new.

Flags: needinfo?(aionescu) → needinfo?(aryx.bugmail)

If one compares the first pinterest-vismet (old) and second one (newer), there is this difference:

old

[task 2020-10-27T19:48:08.914Z] INFO - Comparing videos to TASK_GROUP=OVoOn1h8Q7yuhB2B0QYAGg, TASK_ID=NeqmmcjiRHuaJDSsEGwqTQ
[task 2020-10-27T19:48:08.914Z] INFO - Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/NeqmmcjiRHuaJDSsEGwqTQ/artifacts/public/test_info/browsertime-results.tgz
[task 2020-10-27T19:48:09.698Z] INFO - Found 50 old videos
[task 2020-10-27T20:08:41.000Z] INFO - Average 3D similarity: 0.93241
[task 2020-10-27T20:08:41.000Z] INFO - Average 2D similarity: 0.99208

new

[task 2020-11-11T11:07:50.328Z] INFO - Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/P1dyBzJQSGunIMR5JvauPg/artifacts/public/test_info/browsertime-results.tgz
[task 2020-11-11T11:07:50.570Z] INFO - Found 50 old videos
[task 2020-11-11T11:29:53.070Z] INFO - Average 3D similarity: 0.8469
[task 2020-11-11T11:29:53.070Z] INFO - Average 2D similarity: 0.91059

Please check if the baseline (the different task ids) has a shift in performance metrics:

Flags: needinfo?(aryx.bugmail) → needinfo?(aionescu)
Flags: needinfo?(aionescu)
Flags: needinfo?(aionescu)
Flags: needinfo?(aionescu)

I've marked the improvement alert as invalid as the change was a simularity metric, which we shouldn't be alerting on. It also sounds like this wasn't the correct cause of the change. Alex: please don't clear needinfos without leaving a comment.

You need to log in before you can comment on or make changes to this bug.