40.93 - 18.9% addMarN time_duration / addMarN time_duration (Linux) regression on Wed April 3 2024
Categories
(Core :: Storage: IndexedDB, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr115 | --- | unaffected |
firefox126 | --- | fixed |
firefox127 | --- | fixed |
firefox128 | --- | fixed |
People
(Reporter: aesanu, Assigned: jstutte)
References
(Regression)
Details
(Keywords: perf, perf-alert, regression)
Perfherder has detected a browsertime performance regression from push 58f4bf812f636518abbad47cfd558fb59315d4b5. As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
Ratio | Test | Platform | Options | Absolute values (old vs new) | Performance Profiles |
---|---|---|---|---|---|
41% | addMarN time_duration | linux1804-64-shippable-qr | fission webrender | 18,524.29 -> 26,105.71 | Before/After |
19% | addMarN time_duration | linux1804-64-shippable-qr | fission webrender | 21,685.95 -> 25,783.67 | Before/After |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the patch(es) may be backed out in accordance with our regression policy.
If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.
You can run these tests on try with ./mach try perf --alert 42229
For more information on performance sheriffing please see our FAQ.
Reporter | ||
Updated•10 months ago
|
Assignee | ||
Comment 1•10 months ago
•
|
||
IIUC this does not happen on Windows, that is the test is most likely emphasizing an underlying problem with nsThreadPool
on linux.
The high noise on linux test values both before and after the patch landed is also notable compared to the flat Windows line, but the upwards tick after landing is real, I'd say.
Looking at one of the profiles, it seems that the test is piling up some work that is then thrown on the IndexedDB thread pool at once - but serialized, that is all the events are coming most likely through the same connection. The difference before and after the patch is, that before all events were hitting the same single thread, now we bounce between the two threads we kept alive as idle threads. Though this might sound bad, it is not immediately clear why this would increase the overhead in any measurable way. And in fact on Windows it does not, though the profile shows the same bouncing between two threads also there.
Most of the extra time seems to be spent inside pthread_cond_wait_timed
, so largely out of our direct control if not for signaling new events in time (which happens always in the same moment on Windows and linux, AFAICS).
IIUC all the linux perf tests are running on dedicated machines with the following CPU spec:
1 Intel E3-1585Lv5 3.0GHz CPU
8GB DDR4 2400MHz RAM
which gives us 4 cores / 8 threads. The test is creating load on three threads at most, so this should hopefully not lead to any problems with overhead from context switching between threads on the same CPU core. And interestingly repeating the same test locally on a 12 core /24 threads linux machine confirms roughly the same difference between having just 1 idle thread or 2 (the current value) for the thread pool. Performance is even worse if we keep 4 idle threads. The worst case however is not keeping any idle thread as this will make us spin up and down threads all the time.
AMD Ryzen 9 3900XT 12-Core Processor
mach raptor --browsertime -t addMarN --post-startup-delay=1000
---
Old dedicated thread
08:18:10 INFO - raptor-browsertime Info: CPU / Power usage: 11240
08:18:10 INFO - raptor-browsertime Info: Time duration was 8718.099999999999
08:19:08 INFO - raptor-browsertime Info: CPU / Power usage: 10617
08:19:09 INFO - raptor-browsertime Info: Time duration was 8369.46
---
Idle: 2 (current default)
20:08:54 INFO - raptor-browsertime Info: CPU / Power usage: 11867
20:08:54 INFO - raptor-browsertime Info: Time duration was 9241.9
20:09:36 INFO - raptor-browsertime Info: CPU / Power usage: 12431
20:09:36 INFO - raptor-browsertime Info: Time duration was 9557.88
---
Idle: 1
19:56:29 INFO - raptor-browsertime Info: CPU / Power usage: 10282
19:56:29 INFO - raptor-browsertime Info: Time duration was 7969.180000000001
20:11:51 INFO - raptor-browsertime Info: CPU / Power usage: 10891
20:11:51 INFO - raptor-browsertime Info: Time duration was 8427.64
20:12:33 INFO - raptor-browsertime Info: CPU / Power usage: 10690
20:12:33 INFO - raptor-browsertime Info: Time duration was 8219.84
---
Idle: 4
20:14:36 INFO - raptor-browsertime Info: CPU / Power usage: 12811
20:14:36 INFO - raptor-browsertime Info: Time duration was 9909.62
20:15:28 INFO - raptor-browsertime Info: CPU / Power usage: 12873
20:15:28 INFO - raptor-browsertime Info: Time duration was 9985.52
---
Idle: 0
20:19:49 INFO - raptor-browsertime Info: CPU / Power usage: 20948
20:19:49 INFO - raptor-browsertime Info: Time duration was 12701.56
So there seems something here we might want to investigate about the behavior of nsThreadPool
on linux. Maybe the findings could also shed some light on the noise we see in general in linux execution times. This would probably deserve its own XPCOM bug.
But I do not think we should refrain from using thread pools here. The underlying problem can hit any thread pool under a similar load and was there before. And this test is also not very representative for the load we expect normally to happen on this thread pool and in fact most other tests do not show any noticeable problems. We might want to experiment with different idle settings for this pool, at most, but that might be counter productive for other scenarios (and/or on other OSs).
Comment 2•10 months ago
|
||
Comment 3•10 months ago
|
||
"addMab1 time_duration opt fission webrender" is about 10% better with own thread management (15 test runs).
"addMarN time_duration opt fission webrender" is almost 25% better with own thread management (20 test runs).
Assignee | ||
Comment 4•10 months ago
•
|
||
(In reply to Jan Varga [:janv] from comment #3)
"addMab1 time_duration opt fission webrender" is about 10% better with own thread management (15 test runs).
"addMarN time_duration opt fission webrender" is almost 25% better with own thread management (20 test runs).
Weird that this is not visible in the graph ? And also not for addMab1 graphs
Comment 5•10 months ago
|
||
As discussed offline, the old custom thread pool was handling all runnables for a connection in one single thread and keeping the threads idle for some time before joining them. These features might contribute to the performance in this particular test.
Updated•10 months ago
|
Comment 6•10 months ago
|
||
Setting Fx126 to Fixed, the regressing patch was backed out of beta
https://bugzilla.mozilla.org/show_bug.cgi?id=1791767#c58
Assignee | ||
Comment 7•9 months ago
|
||
It seems bug 1791767 fixed bug 1878149 (see bug 1893807).
Comment 8•9 months ago
•
|
||
backout bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/1d48046b04e7
Fx127 fixed: the regressing patch was backed out of beta
Assignee | ||
Comment 9•8 months ago
•
|
||
After bug 1891664 landed, I can see a slight down-trend, though the noise in these test results is very high. Probably we need some more datapoint to really see it. But from local testing I do not think we should wait for bug 1897072 or other improvements and just let ride bug 1791767 now.
Updated•8 months ago
|
Assignee | ||
Comment 10•8 months ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #9)
After bug 1891664 landed, I can see a slight down-trend, though the noise in these test results is very high. Probably we need some more datapoint to really see it. But from local testing I do not think we should wait for bug 1897072 or other improvements and just let ride bug 1791767 now.
If I read the data correctly, before bug 1891664 results were happening mostly in the corridor between 20-30k, after the landing mostly between 15-25k. And before bug 1791767 landed, we were mostly in the same corridor (15-20k) even if maybe some small percentage better than now.
I thus think we recovered most if not all of the regression. I assume we did not get a perf alert because the first data points after landing were close to the upper limit of that corridor (probably for random reasons).
Assignee | ||
Comment 13•8 months ago
|
||
I see this untriaged alert which refers to a slightly later push but contains exactly the expected result. I do not know exactly how those alerts work, but I am sufficiently sure to close this out.
Fixed by bug 1891664.
Updated•8 months ago
|
Updated•8 months ago
|
Description
•