Closed Bug 1650996 Opened 4 years ago Closed 4 years ago

Stop the RemoteDecoderChild from using a dedicated thread

Categories

(Core :: Audio/Video: Playback, enhancement)

enhancement

Tracking

()

RESOLVED FIXED
mozilla80
Tracking Status
firefox80 --- fixed

People

(Reporter: jya, Assigned: jya)

References

Details

Attachments

(3 files)

We can instead use a TaskQueue running on top of a media threadpool now that those are usable with IPC's MessageChannel

The base IProtocol class already provides this.

This allows to simplify the lifetime management off the RemoteDecoderModule's mManagerThread member.

Depends on D82501

There are some reftests that launch dozens of video elements at once, on Windows the GpuDecoderModule dispatches a sync runnable to the RemoteDecoderManagerChild thread ; if we have more than 4 (the default for the "PLAYBACK" threadpool) then we won't spawn a new thread and all decoders ends up being blocked waiting for the task on the RemoteDecoderManagerChild to finally run.

Instead of relying on a sync dispatch to initialise IPDL, we should probably move that into the main GpuRemoteVideoDecoderChild::Init method so that it can be made to be fully async.

We would need to have a fallback mechanism to the remaining decoder if Init failed however, right now we only have a fallback to try other PDM if creation of the decoder failed rather than its initialisation failing.

Attachment #9161831 - Attachment description: Bug 1650996 - P3. Have RemoteDecoderChild use a TaskQueue over a media threadpool. r?mjf,mattwoodrow → Bug 1650996 - P3. Have RemoteDecoderManagerChild use a TaskQueue over a media threadpool. r?mjf,mattwoodrow
Pushed by jyavenard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6f779a3875a8
P1. Remove RemodeDecoderManagerChild::CanSend method. r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/35faa46f46f1
P2. Always initialise the RemoteDecoderChild manager thread. r=mjf
https://hg.mozilla.org/integration/autoland/rev/de5e55b59a31
P3. Have RemoteDecoderManagerChild use a TaskQueue over a media threadpool. r=mjf

Backed out 5 changesets (bug 1650996, bug 1649974) for hazard failures

Push with failure: https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&searchStr=linux%2Cx64%2Cdebug%2Chazard-linux64-haz%2Fdebug%2C%28h%29&fromchange=d3f7f11b80ea63a55d4d381b88cd758f12714a06&selectedTaskRun=P-lFOs1KSvmTMZ9Xz_CkWA.0&tochange=47a51ee437a2b271872fca1b25b649c36509fb35

Backout link: https://hg.mozilla.org/integration/autoland/rev/d8abefec5bba959f42bac66237b9eabe2d04dfaa

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=308963824&repo=autoland&lineNumber=73953

...
[task 2020-07-08T12:51:30.770Z] Running explain to generate ('hazards.txt', 'unnecessary.txt', 'refs.txt')
[task 2020-07-08T12:51:30.770Z] LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/builds/worker/workspace/obj-haz-shell/dist/bin" PATH="/builds/worker/fetches/sixgill/usr/bin:${PATH}" XDB='/builds/worker/fetches/sixgill/usr/bin/xdb.so' SOURCE='/builds/worker/checkouts/gecko' ANALYZED_OBJDIR='/builds/worker/workspace/obj-analyzed' python2.7 /builds/worker/checkouts/gecko/js/src/devtools/rootAnalysis/explain.py rootingHazards.txt gcFunctions.txt hazards.txt unnecessary.txt refs.txt
[task 2020-07-08T12:51:33.400Z] Wrote explained_hazards.tmp
[task 2020-07-08T12:51:33.400Z] Wrote unnecessary.tmp
[task 2020-07-08T12:51:33.400Z] Wrote refs.tmp
[task 2020-07-08T12:51:33.400Z] Found 2 hazards 200 unsafe references 0 missing
[task 2020-07-08T12:51:33.401Z] Running heapwrites to generate heapWriteHazards.txt
[task 2020-07-08T12:51:33.401Z] LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/builds/worker/workspace/obj-haz-shell/dist/bin" PATH="/builds/worker/fetches/sixgill/usr/bin:${PATH}" XDB='/builds/worker/fetches/sixgill/usr/bin/xdb.so' SOURCE='/builds/worker/checkouts/gecko' ANALYZED_OBJDIR='/builds/worker/workspace/obj-analyzed' /builds/worker/workspace/obj-haz-shell/dist/bin/js /builds/worker/checkouts/gecko/js/src/devtools/rootAnalysis/analyzeHeapWrites.js > heapWriteHazards.txt
[task 2020-07-08T12:52:01.488Z] + check_hazards /builds/worker/workspace/analysis
[task 2020-07-08T12:52:01.489Z] + set +e
[task 2020-07-08T12:52:01.489Z] ++ grep -c 'Function.*has unrooted.*live across GC call' /builds/worker/workspace/analysis/rootingHazards.txt
[task 2020-07-08T12:52:01.490Z] + NUM_HAZARDS=2
[task 2020-07-08T12:52:01.490Z] ++ grep -c '^Function.*takes unsafe address of unrooted' /builds/worker/workspace/analysis/refs.txt
[task 2020-07-08T12:52:01.492Z] + NUM_UNSAFE=200
[task 2020-07-08T12:52:01.492Z] ++ grep -c '^Function.* has unnecessary root' /builds/worker/workspace/analysis/unnecessary.txt
[task 2020-07-08T12:52:01.494Z] + NUM_UNNECESSARY=1314
[task 2020-07-08T12:52:01.494Z] ++ grep -c '^Dropped CFG' /builds/worker/workspace/analysis/build_xgill.log
[task 2020-07-08T12:52:01.618Z] + NUM_DROPPED=0
[task 2020-07-08T12:52:01.619Z] ++ perl -lne 'print $1 if m!found (\d+)/\d+ allowed errors!' /builds/worker/workspace/analysis/heapWriteHazards.txt
[task 2020-07-08T12:52:01.621Z] + NUM_WRITE_HAZARDS=0
[task 2020-07-08T12:52:01.621Z] ++ grep -c '^Function.*expected hazard.*but none were found' /builds/worker/workspace/analysis/rootingHazards.txt
[task 2020-07-08T12:52:01.622Z] + NUM_MISSING=0
[task 2020-07-08T12:52:01.622Z] + set +x
[task 2020-07-08T12:52:01.622Z] TinderboxPrint: rooting hazards<br/>2
[task 2020-07-08T12:52:01.622Z] TinderboxPrint: (unsafe references to unrooted GC pointers)<br/>200
[task 2020-07-08T12:52:01.622Z] TinderboxPrint: (unnecessary roots)<br/>1314
[task 2020-07-08T12:52:01.622Z] TinderboxPrint: missing expected hazards<br/>0
[task 2020-07-08T12:52:01.622Z] TinderboxPrint: heap write hazards<br/>0
[task 2020-07-08T12:52:01.624Z] TEST-UNEXPECTED-FAIL | hazards | unrooted 'scopedArr' of type 'mozilla::dom::TypedArray<unsigned char, js::UnwrapUint8ClampedArray, JS_GetUint8ClampedArrayData, js::GetUint8ClampedArrayLengthAndData, JS_NewUint8ClampedArray>' live across GC call at dom/canvas/WebGLTextureUpload.cpp:947
[task 2020-07-08T12:52:01.624Z] TEST-UNEXPECTED-FAIL | hazards | unrooted 'obj' of type 'JSObject*' live across GC call at js/xpconnect/src/XPCJSRuntime.cpp:3253
[task 2020-07-08T12:52:01.624Z] TEST-UNEXPECTED-FAIL | hazards | 2 rooting hazards detected
[task 2020-07-08T12:52:01.624Z] TinderboxPrint: documentation<br/><a href='https://wiki.mozilla.org/Javascript:Hazard_Builds#Diagnosing_a_rooting_hazards_failure'>static rooting hazard analysis failures</a>, visit "Inspect Task" link for hazard details
[task 2020-07-08T12:52:01.624Z] + onexit
[task 2020-07-08T12:52:01.624Z] + grab_artifacts /builds/worker/workspace/analysis /builds/worker/artifacts
[task 2020-07-08T12:52:01.624Z] + local analysis_dir
[task 2020-07-08T12:52:01.624Z] + analysis_dir=/builds/worker/workspace/analysis
...
Flags: needinfo?(jyavenard)
Pushed by jyavenard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/236757acc073
P1. Remove RemodeDecoderManagerChild::CanSend method. r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/374598f9c37a
P2. Always initialise the RemoteDecoderChild manager thread. r=mjf
https://hg.mozilla.org/integration/autoland/rev/8f8174ba409d
P3. Have RemoteDecoderManagerChild use a TaskQueue over a media threadpool. r=mjf
Backout by rmaries@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b8fad3d28fcd
Backed out 5 changesets (bug 1650996, bug 1649974) for perma failures on Android 7.0. CLOSED TREE
Flags: needinfo?(jyavenard)
Pushed by jyavenard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/51ceb0b7150c
P1. Remove RemodeDecoderManagerChild::CanSend method. r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/399f05bfe205
P2. Always initialise the RemoteDecoderChild manager thread. r=mjf
https://hg.mozilla.org/integration/autoland/rev/f3ab1da2cea8
P3. Have RemoteDecoderManagerChild use a TaskQueue over a media threadpool. r=mjf
Regressions: 1653638
Regressions: 1654493

Can we back this out of beta to address the regressions?

Flags: needinfo?(jyavenard)

(In reply to Julien Cristau [:jcristau] from comment #12)

Can we back this out of beta to address the regressions?

The fix is safe. Wouldn't that be a better solution?

Flags: needinfo?(jyavenard)

For 81 sure. For 80 I'm not convinced, and it's been sitting for 2 weeks already.

(In reply to Julien Cristau [:jcristau] from comment #14)

For 81 sure. For 80 I'm not convinced, and it's been sitting for 2 weeks already.

And there's 26 days to go before release.

There's much more risk to backout this change (as it comes in a long series of thread changes published in the days prior and following) than uplifting the fix which is just using a different thread in webaudio to prevent deadlock.

Regressions: 1656847
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: