5.2% regression on Webassenbly-embenchen-compile (WASM optimizing) around 3Oct2024 on Windows-mozilla-central
Categories
(Core :: JavaScript: WebAssembly, defect, P1)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox-esr128 | --- | unaffected |
| firefox131 | --- | unaffected |
| firefox132 | --- | unaffected |
| firefox133 | --- | fixed |
People
(Reporter: mayankleoboy1, Assigned: bas.schouten)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: regression)
Attachments
(1 file)
The graph says it all, I think.
There looks to be some corresponding improvements in this range, but the dates seem somewhat off.
Linux and Mac for the same period.
Not sure if this is a big enough change, or if it is even worth over-rotating on these sub-benchmarks.
Comment 1•1 year ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::JavaScript: WebAssembly' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
Comment 2•1 year ago
|
||
Hm do you see anything interesting in that regression range?
| Reporter | ||
Comment 3•1 year ago
|
||
JS related bugs are: bug 1921963, bug 1904995, bug 1904957 (may impact compiling on multiple threads).
But nothing that looks interesting, which is what i dont understand. And because these tests are on m-c, I dont know how to backfill.
The datapoint just before the regression contains bug 1919025 part3 (removal of old dominator stuff). But even that shouldnt affect things.
So i cant guess anything.
Comment 4•1 year ago
|
||
Yeah I asked because I didn't see anything obvious either :)
It could be bug 1919025 because that also switched to the new code, but a 5% regression from that is a bit unexpected and it means there was one "good" data point still too...
| Reporter | ||
Comment 5•1 year ago
|
||
The numbers have improved probably because of bug 1915863.
But still not back to the previous baseline.
Comment 6•1 year ago
|
||
If I'm reading the graphs right, it looks like the compile regression was just on windows? If so, that seems like a data point in favor of bug 1904957 which should only effect windows. If that bug changes the configuration of our thread pool for compilation, that could definitely cause a regression here.
Comment 7•1 year ago
|
||
I did a backout of bug 1904957 and confirmed that is the source of this regression [1] [2].
Bas, do you think there's anything we can do here to fix this or is this expected?
[1] https://treeherder.mozilla.org/perfherder/comparesubtest?originalProject=mozilla-central&newProject=try&newRevision=082549aad8f58830b3393a7ce96c9f21695453b7&originalSignature=87302&newSignature=5133239&framework=13&application=firefox&originalRevision=11ab9645846ff996343f1c5bdb0c65165724a1b6&page=1#tableLink-row-472801410
[2] https://treeherder.mozilla.org/jobs?repo=try&revision=082549aad8f58830b3393a7ce96c9f21695453b7&selectedTaskRun=H_JmexgtRg2ksk-4NANXdw.0
Comment 8•1 year ago
|
||
Set release status flags based on info from the regressing bug 1904957
Comment 10•1 year ago
|
||
Looking at the backout commit, either CreateHeterogeneousCpuInfo() is very
expensive or TaskController::GetPoolThreadCount() is called a lot. Neither of
which seem particularly plausible to me. We could at least check whether the
latter is true though.
Comment 11•1 year ago
|
||
(In reply to Julian Seward [:jseward] from comment #10)
Looking at the backout commit, either CreateHeterogeneousCpuInfo() is very
expensive or TaskController::GetPoolThreadCount() is called a lot. Neither of
which seem particularly plausible to me. We could at least check whether the
latter is true though.
I think it's more likely that implementing CreateHeterogenousCpuInfo() for Windows changed the number of background threads we have available to compile wasm code on.
| Assignee | ||
Comment 12•1 year ago
|
||
(In reply to Ryan Hunt [:rhunt] from comment #11)
(In reply to Julian Seward [:jseward] from comment #10)
Looking at the backout commit, either CreateHeterogeneousCpuInfo() is very
expensive or TaskController::GetPoolThreadCount() is called a lot. Neither of
which seem particularly plausible to me. We could at least check whether the
latter is true though.I think it's more likely that implementing CreateHeterogenousCpuInfo() for Windows changed the number of background threads we have available to compile wasm code on.
I agree. This is interesting. I wonder if we'll have to treat E-cores a little different on Windows to get the best result, what do you think Olli?
Comment 13•1 year ago
|
||
Also, it looks the wasm-misc-baseline compile subtest regressed by 8.6% too (from the backout run I posted). Neither of these are production configurations because they disable one of our compilers. I don't have results for our production config (just plain wasm-misc), but I'm triggering some tasks for it. I'd be surprised though if it didn't affect that one if it affected both baseline/ion configs.
Comment 14•1 year ago
|
||
I'd back out the TaskController change for now.
Do the WASM tests run at all on Apple Silicon? Since the TaskController change was initially only for Apple Silicon Macs / sp3.
Updated•1 year ago
|
Comment 15•1 year ago
|
||
(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #14)
I'd back out the TaskController change for now.
Do the WASM tests run at all on Apple Silicon? Since the TaskController change was initially only for Apple Silicon Macs / sp3.
Fx133 is now in soft code freeze and will go to beta on 2024-10-28.
Is the plan here to back out for now or something else?
Comment 16•1 year ago
|
||
(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #14)
I'd back out the TaskController change for now.
Do the WASM tests run at all on Apple Silicon? Since the TaskController change was initially only for Apple Silicon Macs / sp3.
I don't think the wasm benchmarks run on Apple Silicon. Looks like the task [1] runs on Intel Mac Mini's.
[1] test-macosx1015-64-shippable-qr/opt-browsertime-benchmark-wasm-firefox
Comment 17•1 year ago
|
||
I think backing this out makes sense if there's no immediate fix. Bas, can you back this out?
I was also able to confirm from retriggers that this affected our stock configuration, not just the baseline/ion only variants.
Comment 18•1 year ago
|
||
We shouldn't need to back out anything else but https://searchfox.org/mozilla-central/diff/127f29b137640831d12c141acb620510cf01fcd2/xpcom/threads/TaskController.cpp#81
Comment 19•1 year ago
|
||
Updated•1 year ago
|
Comment 20•1 year ago
|
||
Comment 21•1 year ago
|
||
| bugherder | ||
Updated•9 months ago
|
Description
•