Intermittent toolkit/components/extensions/test/mochitest/test_ext_streamfilter_multiple.html | Correctly intercepted page content - got "Start Middle\n", expected "Start Middle\n End"
Categories
(WebExtensions :: General, defect, P3)
Tracking
(Not tracked)
People
(Reporter: intermittent-bug-filer, Unassigned)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: intermittent-failure, regression)
Attachments
(2 files)
Filed by: btara [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=296919334&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Zt4Z1hxxQYK5xIH0SmYNyw/runs/0/artifacts/public/logs/live_backing.log
Reftest URL: https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Zt4Z1hxxQYK5xIH0SmYNyw/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1
[task 2020-04-09T09:24:34.911Z] 09:24:34 INFO - TEST-START | toolkit/components/extensions/test/mochitest/test_ext_streamfilter_multiple.html
[task 2020-04-09T09:24:34.996Z] 09:24:34 INFO - GECKO(2265) | ###!!! [Child][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2020-04-09T09:24:35.010Z] 09:24:35 INFO - TEST-INFO | started process screencapture
[task 2020-04-09T09:24:35.177Z] 09:24:35 INFO - TEST-INFO | screencapture: exit 0
[task 2020-04-09T09:24:35.177Z] 09:24:35 INFO - Buffered messages logged at 09:24:34
[task 2020-04-09T09:24:35.178Z] 09:24:35 INFO - add_task | Entering test
[task 2020-04-09T09:24:35.178Z] 09:24:35 INFO - Extension loaded
[task 2020-04-09T09:24:35.178Z] 09:24:35 INFO - Extension loaded
[task 2020-04-09T09:24:35.178Z] 09:24:35 INFO - Buffered messages finished
[task 2020-04-09T09:24:35.178Z] 09:24:35 INFO - TEST-UNEXPECTED-FAIL | toolkit/components/extensions/test/mochitest/test_ext_streamfilter_multiple.html | Correctly intercepted page content - got "Start Middle\n", expected "Start Middle\n End"
[task 2020-04-09T09:24:35.178Z] 09:24:35 INFO - SimpleTest.is@SimpleTest/SimpleTest.js:383:14
[task 2020-04-09T09:24:35.178Z] 09:24:35 INFO - @toolkit/components/extensions/test/mochitest/test_ext_streamfilter_multiple.html:82:14
[task 2020-04-09T09:24:35.179Z] 09:24:35 INFO - async*nextTick/<@SimpleTest/SimpleTest.js:2090:34
[task 2020-04-09T09:24:35.179Z] 09:24:35 INFO - nextTick@SimpleTest/SimpleTest.js:2115:11
[task 2020-04-09T09:24:35.179Z] 09:24:35 INFO - setTimeout handler*SimpleTest_setTimeoutShim@SimpleTest/SimpleTest.js:788:41
[task 2020-04-09T09:24:35.179Z] 09:24:35 INFO - add_task@SimpleTest/SimpleTest.js:2045:17
[task 2020-04-09T09:24:35.179Z] 09:24:35 INFO - @toolkit/components/extensions/test/mochitest/test_ext_streamfilter_multiple.html:18:9
[task 2020-04-09T09:24:35.179Z] 09:24:35 INFO - add_task | Leaving test
[task 2020-04-09T09:24:35.179Z] 09:24:35 INFO - GECKO(2265) | MEMORY STAT | vsize 6723MB | residentFast 92MB | heapAllocated 12MB
[task 2020-04-09T09:24:35.179Z] 09:24:35 INFO - TEST-OK | toolkit/components/extensions/test/mochitest/test_ext_streamfilter_multiple.html | took 118ms
Updated•4 years ago
|
Comment 1•4 years ago
|
||
Comment 2•4 years ago
|
||
It started to pop up on Windows too.
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=296958731&repo=autoland&lineNumber=9742
Updated•4 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 5•4 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=27de46f16eff3f8d03eba9fc39e5db12a69068e3
Comment 6•4 years ago
|
||
The test run in comment 5 is to see if this is a timing issue in the test itself.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 9•4 years ago
|
||
The failure appeared: https://treeherder.mozilla.org/#/jobs?repo=try&revision=27de46f16eff3f8d03eba9fc39e5db12a69068e3&selectedJob=297251812&group_state=expanded
Shane, should we disable the test for the time being?
Thank you.
Comment 10•4 years ago
|
||
Pernosco link: https://pernos.co/debug/igUdXYLFWmZBXz2g0FTaAw/index.html
I'll try to debug this tomorrow.
Comment 11•4 years ago
|
||
It looks like the issue happens when the second extension runs first.
The second extension is emitting data in two chunks (one with the page content, one with the extra appended data).
The first extension handles ondata once, prepending its content and then disconnecting. It doesn't wait for the second ondata to forward that through.
I think the first extension should do its prepending in onstart, directly copy all data in ondata, and then disconnect in onstop.
Comment 12•4 years ago
|
||
Actually, that seems like it should behave fine, since the first extension is disconnecting.
It looks like we get to StreamFilterParent::DoSendData with aData=" End" (the second chunk of data from second extension being received by the filter parent for first extension). At this point mState==Disconnecting, so we don't call SendData, and just drop the data.
It seems like a pretty bad problem if disconnecting can race with later packets of the response and cause them to be dropped.
Comment 13•4 years ago
|
||
I'm not entirely sure how to fix this.
We could post the data back to the IOThread to call BufferData with it, but it's possible in the mean time for another OnDataAvaible call to have arrived, seen mState==Disconnecting and also called BufferData ahead of us (and thus in the wrong order).
I suspect we need to route all data through the actor thread (which is the only thread where mState is non-racy) so that we can resolve these ordering races.
Any thoughts kmag?
Comment hidden (Intermittent Failures Robot) |
Comment 15•4 years ago
|
||
Updated•4 years ago
|
Comment 16•4 years ago
|
||
Pushed by csabou@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7b84e0046f62 Disable test_ext_streamfilter_multiple.html on all platforms for frequent failures. r=aryx
Updated•4 years ago
|
Comment 17•4 years ago
|
||
Given that this seems like a high probability of data loss or corruption and that it is not so uncommon to have a couple extensions doing filters, I think this bug needs to be a priority. I'm not certain how best to handle this issue, but it's probably also going to be related to bug 1629299.
Comment 18•4 years ago
|
||
(In reply to Shane Caraveo (:mixedpuppy) from comment #17)
Given that this seems like a high probability of data loss or corruption and that it is not so uncommon to have a couple extensions doing filters, I think this bug needs to be a priority. I'm not certain how best to handle this issue, but it's probably also going to be related to bug 1629299.
I think it could happen even with a single extension, it just needs a network response delivered in multiple OnDataAvailable calls, and an extension that uses disconnect() from the not-last one.
As for fixing this, we have mBufferedData and the two-stage+round-trip disconnect to prevent races due to the asynchronous connection between StreamFilterParent/Child. Unfortunately, we also have a second async step from IO thread to actor thread, which isn't handled.
One solution could be to extend the round-trip disconnect synchronization to go through the IO thread (and also the main thread somehow).
Another would be to unconditionally forward OnDataAvailable to the actor thread, and only make decisions on that thread. This might add some more overhead, but seems simpler to get right.
Either way, I think we want to more clearly mark which thread each member variable is valid on, and try to racy accesses.
Comment 19•4 years ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 22•4 years ago
|
||
I think we need to check for the Disconnecting state in the DoSendData callback, and buffer the data rather than dropping it if we are disconnecting.
Comment 23•4 years ago
|
||
@CosminS There were a few race fixes a month or so after this test was disabled, can we reenable the test for a bit and see if we get the intermittent back?
Comment 24•4 years ago
|
||
Pushed by csabou@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ef0e4ae63cfd Unskip test_ext_streamfilter_multiple.html to see if the failures are still present. a=testonly
Comment 25•4 years ago
|
||
Just pushed a testonly change and will circle back next week to see if the failures are still around.
Updated•4 years ago
|
Comment 27•4 years ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Comment 29•4 years ago
|
||
Shane, there are 22 failures since the test was re-enabled, most on linux1804-64-asan: https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2020-10-21&endday=2020-10-28&tree=trunk&bug=1628642
Recent log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=320049341&repo=autoland&lineNumber=4756
[task 2020-10-28T19:06:25.301Z] 19:06:25 INFO - TEST-START | toolkit/components/extensions/test/mochitest/test_ext_streamfilter_multiple.html
[task 2020-10-28T19:06:25.760Z] 19:06:25 INFO - GECKO(3196) | ###!!! [Child][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2020-10-28T19:06:25.788Z] 19:06:25 INFO - TEST-INFO | started process screentopng
[task 2020-10-28T19:06:26.108Z] 19:06:26 INFO - TEST-INFO | screentopng: exit 0
[task 2020-10-28T19:06:26.109Z] 19:06:26 INFO - Buffered messages logged at 19:06:25
[task 2020-10-28T19:06:26.109Z] 19:06:26 INFO - add_task | Entering test
[task 2020-10-28T19:06:26.109Z] 19:06:26 INFO - Extension loaded
[task 2020-10-28T19:06:26.109Z] 19:06:26 INFO - Extension loaded
[task 2020-10-28T19:06:26.109Z] 19:06:26 INFO - Buffered messages finished
[task 2020-10-28T19:06:26.110Z] 19:06:26 INFO - TEST-UNEXPECTED-FAIL | toolkit/components/extensions/test/mochitest/test_ext_streamfilter_multiple.html | Correctly intercepted page content - got "Start Middle\n", expected "Start Middle\n End"
[task 2020-10-28T19:06:26.110Z] 19:06:26 INFO - SimpleTest.is@SimpleTest/SimpleTest.js:500:14
[task 2020-10-28T19:06:26.110Z] 19:06:26 INFO - @toolkit/components/extensions/test/mochitest/test_ext_streamfilter_multiple.html:82:14
[task 2020-10-28T19:06:26.111Z] 19:06:26 INFO - add_task | Leaving test
[task 2020-10-28T19:06:26.112Z] 19:06:26 INFO - GECKO(3196) | MEMORY STAT | vsize 20974599MB | residentFast 861MB
[task 2020-10-28T19:06:26.112Z] 19:06:26 INFO - TEST-OK | toolkit/components/extensions/test/mochitest/test_ext_streamfilter_multiple.html | took 580ms
[task 2020-10-28T19:06:26.113Z] 19:06:26 INFO - TEST-START | toolkit/components/extensions/test/mochitest/test_ext_streamfilter_processswitch.html
Could you have a look over them? Thank you.
Comment hidden (Intermittent Failures Robot) |
Comment 31•4 years ago
|
||
Ok, this will require some more investigation. I'm wondering if disabling it on linux and win7 (or maybe make it osx-only) would be sufficient to bring the numbers down low enough to be acceptable, that way at least the test is running.
Comment 32•4 years ago
|
||
Updated•4 years ago
|
Comment 33•4 years ago
|
||
Most failures are on opt builds and linux so made a disable patch for that to bring the failures down.
https://treeherder.mozilla.org/intermittent-failures/bugdetails?startday=2020-10-08&endday=2020-11-07&tree=trunk&bug=1628642
Comment hidden (Intermittent Failures Robot) |
Comment 35•4 years ago
|
||
Pushed by csabou@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/32fe9aa2c5c4 Disable test_ext_streamfilter_multiple.html on linux and !debug all platforms. r=mixedpuppy,#intermittent-reviewers
Comment 36•4 years ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•3 years ago
|
Updated•3 years ago
|
Comment hidden (Intermittent Failures Robot) |
Updated•3 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•2 years ago
|
Updated•2 years ago
|
Description
•