Closed Bug 1628642 Opened 4 years ago Closed 2 years ago

Intermittent toolkit/components/extensions/test/mochitest/test_ext_streamfilter_multiple.html | Correctly intercepted page content - got "Start Middle\n", expected "Start Middle\n End"

Categories

(WebExtensions :: General, defect, P3)

defect
Points:
2

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1777019

People

(Reporter: intermittent-bug-filer, Unassigned)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: intermittent-failure, regression)

Attachments

(2 files)

Filed by: btara [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=296919334&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Zt4Z1hxxQYK5xIH0SmYNyw/runs/0/artifacts/public/logs/live_backing.log
Reftest URL: https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Zt4Z1hxxQYK5xIH0SmYNyw/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1


[task 2020-04-09T09:24:34.911Z] 09:24:34     INFO - TEST-START | toolkit/components/extensions/test/mochitest/test_ext_streamfilter_multiple.html
[task 2020-04-09T09:24:34.996Z] 09:24:34     INFO - GECKO(2265) | ###!!! [Child][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2020-04-09T09:24:35.010Z] 09:24:35     INFO - TEST-INFO | started process screencapture
[task 2020-04-09T09:24:35.177Z] 09:24:35     INFO - TEST-INFO | screencapture: exit 0
[task 2020-04-09T09:24:35.177Z] 09:24:35     INFO - Buffered messages logged at 09:24:34
[task 2020-04-09T09:24:35.178Z] 09:24:35     INFO - add_task | Entering test 
[task 2020-04-09T09:24:35.178Z] 09:24:35     INFO - Extension loaded
[task 2020-04-09T09:24:35.178Z] 09:24:35     INFO - Extension loaded
[task 2020-04-09T09:24:35.178Z] 09:24:35     INFO - Buffered messages finished
[task 2020-04-09T09:24:35.178Z] 09:24:35     INFO - TEST-UNEXPECTED-FAIL | toolkit/components/extensions/test/mochitest/test_ext_streamfilter_multiple.html | Correctly intercepted page content - got "Start Middle\n", expected "Start Middle\n End"
[task 2020-04-09T09:24:35.178Z] 09:24:35     INFO -     SimpleTest.is@SimpleTest/SimpleTest.js:383:14
[task 2020-04-09T09:24:35.178Z] 09:24:35     INFO -     @toolkit/components/extensions/test/mochitest/test_ext_streamfilter_multiple.html:82:14
[task 2020-04-09T09:24:35.179Z] 09:24:35     INFO -     async*nextTick/<@SimpleTest/SimpleTest.js:2090:34
[task 2020-04-09T09:24:35.179Z] 09:24:35     INFO -     nextTick@SimpleTest/SimpleTest.js:2115:11
[task 2020-04-09T09:24:35.179Z] 09:24:35     INFO -     setTimeout handler*SimpleTest_setTimeoutShim@SimpleTest/SimpleTest.js:788:41
[task 2020-04-09T09:24:35.179Z] 09:24:35     INFO -     add_task@SimpleTest/SimpleTest.js:2045:17
[task 2020-04-09T09:24:35.179Z] 09:24:35     INFO -     @toolkit/components/extensions/test/mochitest/test_ext_streamfilter_multiple.html:18:9
[task 2020-04-09T09:24:35.179Z] 09:24:35     INFO - add_task | Leaving test 
[task 2020-04-09T09:24:35.179Z] 09:24:35     INFO - GECKO(2265) | MEMORY STAT | vsize 6723MB | residentFast 92MB | heapAllocated 12MB
[task 2020-04-09T09:24:35.179Z] 09:24:35     INFO - TEST-OK | toolkit/components/extensions/test/mochitest/test_ext_streamfilter_multiple.html | took 118ms
Has Regression Range: --- → yes
Summary: Intermittent Tier 2 Test Verify unix only toolkit/components/extensions/test/mochitest/test_ext_streamfilter_multiple.html | Correctly intercepted page content - got "Start Middle\n", expected "Start Middle\n End" → Intermittent unix only toolkit/components/extensions/test/mochitest/test_ext_streamfilter_multiple.html | Correctly intercepted page content - got "Start Middle\n", expected "Start Middle\n End"
Summary: Intermittent unix only toolkit/components/extensions/test/mochitest/test_ext_streamfilter_multiple.html | Correctly intercepted page content - got "Start Middle\n", expected "Start Middle\n End" → Intermittent toolkit/components/extensions/test/mochitest/test_ext_streamfilter_multiple.html | Correctly intercepted page content - got "Start Middle\n", expected "Start Middle\n End"
Flags: needinfo?(matt.woodrow)
Whiteboard: [retriggered]

The test run in comment 5 is to see if this is a timing issue in the test itself.

Flags: needinfo?(mixedpuppy)

Pernosco link: https://pernos.co/debug/igUdXYLFWmZBXz2g0FTaAw/index.html

I'll try to debug this tomorrow.

It looks like the issue happens when the second extension runs first.

The second extension is emitting data in two chunks (one with the page content, one with the extra appended data).

The first extension handles ondata once, prepending its content and then disconnecting. It doesn't wait for the second ondata to forward that through.

I think the first extension should do its prepending in onstart, directly copy all data in ondata, and then disconnect in onstop.

Flags: needinfo?(matt.woodrow)

Actually, that seems like it should behave fine, since the first extension is disconnecting.

It looks like we get to StreamFilterParent::DoSendData with aData=" End" (the second chunk of data from second extension being received by the filter parent for first extension). At this point mState==Disconnecting, so we don't call SendData, and just drop the data.

It seems like a pretty bad problem if disconnecting can race with later packets of the response and cause them to be dropped.

I'm not entirely sure how to fix this.

We could post the data back to the IOThread to call BufferData with it, but it's possible in the mean time for another OnDataAvaible call to have arrived, seen mState==Disconnecting and also called BufferData ahead of us (and thus in the wrong order).

I suspect we need to route all data through the actor thread (which is the only thread where mState is non-racy) so that we can resolve these ordering races.

Any thoughts kmag?

Flags: needinfo?(kmaglione+bmo)
Assignee: nobody → csabou
Status: NEW → ASSIGNED
Pushed by csabou@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7b84e0046f62
Disable test_ext_streamfilter_multiple.html on all platforms for frequent failures. r=aryx
Keywords: leave-open
Whiteboard: [retriggered][stockwell disable-recommended] → [retriggered][stockwell disabled]

Given that this seems like a high probability of data loss or corruption and that it is not so uncommon to have a couple extensions doing filters, I think this bug needs to be a priority. I'm not certain how best to handle this issue, but it's probably also going to be related to bug 1629299.

Assignee: csabou → nobody
Status: ASSIGNED → NEW
Priority: P5 → P1

(In reply to Shane Caraveo (:mixedpuppy) from comment #17)

Given that this seems like a high probability of data loss or corruption and that it is not so uncommon to have a couple extensions doing filters, I think this bug needs to be a priority. I'm not certain how best to handle this issue, but it's probably also going to be related to bug 1629299.

I think it could happen even with a single extension, it just needs a network response delivered in multiple OnDataAvailable calls, and an extension that uses disconnect() from the not-last one.

As for fixing this, we have mBufferedData and the two-stage+round-trip disconnect to prevent races due to the asynchronous connection between StreamFilterParent/Child. Unfortunately, we also have a second async step from IO thread to actor thread, which isn't handled.

One solution could be to extend the round-trip disconnect synchronization to go through the IO thread (and also the main thread somehow).

Another would be to unconditionally forward OnDataAvailable to the actor thread, and only make decisions on that thread. This might add some more overhead, but seems simpler to get right.

Either way, I think we want to more clearly mark which thread each member variable is valid on, and try to racy accesses.

I think we need to check for the Disconnecting state in the DoSendData callback, and buffer the data rather than dropping it if we are disconnecting.

Flags: needinfo?(kmaglione+bmo)

@CosminS There were a few race fixes a month or so after this test was disabled, can we reenable the test for a bit and see if we get the intermittent back?

Flags: needinfo?(csabou)
Pushed by csabou@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ef0e4ae63cfd
Unskip test_ext_streamfilter_multiple.html to see if the failures are still present. a=testonly

Just pushed a testonly change and will circle back next week to see if the failures are still around.

Flags: needinfo?(csabou)
Whiteboard: [retriggered][stockwell disabled]

ni? me if they come back.

Flags: needinfo?(mixedpuppy)

Shane, there are 22 failures since the test was re-enabled, most on linux1804-64-asan: https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2020-10-21&endday=2020-10-28&tree=trunk&bug=1628642
Recent log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=320049341&repo=autoland&lineNumber=4756

[task 2020-10-28T19:06:25.301Z] 19:06:25     INFO - TEST-START | toolkit/components/extensions/test/mochitest/test_ext_streamfilter_multiple.html
[task 2020-10-28T19:06:25.760Z] 19:06:25     INFO - GECKO(3196) | ###!!! [Child][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2020-10-28T19:06:25.788Z] 19:06:25     INFO - TEST-INFO | started process screentopng
[task 2020-10-28T19:06:26.108Z] 19:06:26     INFO - TEST-INFO | screentopng: exit 0
[task 2020-10-28T19:06:26.109Z] 19:06:26     INFO - Buffered messages logged at 19:06:25
[task 2020-10-28T19:06:26.109Z] 19:06:26     INFO - add_task | Entering test 
[task 2020-10-28T19:06:26.109Z] 19:06:26     INFO - Extension loaded
[task 2020-10-28T19:06:26.109Z] 19:06:26     INFO - Extension loaded
[task 2020-10-28T19:06:26.109Z] 19:06:26     INFO - Buffered messages finished
[task 2020-10-28T19:06:26.110Z] 19:06:26     INFO - TEST-UNEXPECTED-FAIL | toolkit/components/extensions/test/mochitest/test_ext_streamfilter_multiple.html | Correctly intercepted page content - got "Start Middle\n", expected "Start Middle\n End"
[task 2020-10-28T19:06:26.110Z] 19:06:26     INFO -     SimpleTest.is@SimpleTest/SimpleTest.js:500:14
[task 2020-10-28T19:06:26.110Z] 19:06:26     INFO -     @toolkit/components/extensions/test/mochitest/test_ext_streamfilter_multiple.html:82:14
[task 2020-10-28T19:06:26.111Z] 19:06:26     INFO - add_task | Leaving test 
[task 2020-10-28T19:06:26.112Z] 19:06:26     INFO - GECKO(3196) | MEMORY STAT | vsize 20974599MB | residentFast 861MB
[task 2020-10-28T19:06:26.112Z] 19:06:26     INFO - TEST-OK | toolkit/components/extensions/test/mochitest/test_ext_streamfilter_multiple.html | took 580ms
[task 2020-10-28T19:06:26.113Z] 19:06:26     INFO - TEST-START | toolkit/components/extensions/test/mochitest/test_ext_streamfilter_processswitch.html

Could you have a look over them? Thank you.

Flags: needinfo?(mixedpuppy)

Ok, this will require some more investigation. I'm wondering if disabling it on linux and win7 (or maybe make it osx-only) would be sufficient to bring the numbers down low enough to be acceptable, that way at least the test is running.

Flags: needinfo?(mixedpuppy) → needinfo?(csabou)
Assignee: nobody → csabou
Status: NEW → ASSIGNED

Most failures are on opt builds and linux so made a disable patch for that to bring the failures down.
https://treeherder.mozilla.org/intermittent-failures/bugdetails?startday=2020-10-08&endday=2020-11-07&tree=trunk&bug=1628642

Flags: needinfo?(csabou)
Pushed by csabou@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/32fe9aa2c5c4
Disable test_ext_streamfilter_multiple.html on linux and !debug all platforms. r=mixedpuppy,#intermittent-reviewers
Assignee: csabou → nobody
Status: ASSIGNED → NEW
Points: --- → 2
Severity: normal → --
Priority: P1 → --
Severity: -- → S4
Priority: -- → P3
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: