Perma [tier 2] linux1804-64-clang-trunk-qr opt mass failures that affect awsy and btime
Categories
(Testing :: General, defect, P2)
Tracking
(Not tracked)
People
(Reporter: imoraru, Unassigned)
References
Details
(Whiteboard: [stockwell unknown])
This started happening all at once and only on linux1804-64-clang-trunk-qr opt.
Reporter | ||
Updated•8 months ago
|
Reporter | ||
Comment 1•8 months ago
|
||
Hi Joel! Can you please take a look at this? Maybe you can figure out what is going on here.
Thank you!
Comment hidden (Intermittent Failures Robot) |
Comment 3•8 months ago
|
||
the push prior to the awsy failures seems to be where we turned on perf tests on clang builds. I don't know why we even run perf tests there as this isn't shippable, according to perfherder this has been running on m-c for >1 year, so why did we not have data for 20+ builds on m-c?
these tasks appear to be scheduled by a cron job:
Decision Task for cron job linux64-clang-trunk-perf
and even on previous commits this would run and generate target tasks, but those tasks wouldn't show up.
I am not aware of any other testing done on linux clang builds, so probably the builds are not functional in some way?
things to figure out:
- why did these just start running (june 19th was the last run, then a break for a week?)
- why do we run perf tests on here, but no unit tests?
- can we add to autoland and bisect down?
I will start with the perf tooling team as we could understand the why and maybe be aware of more history of on/off tests or why they just started back up.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 8•8 months ago
|
||
:jmaher, I'm not sure why they broke like that, or why the scheduling is odd. We don't monitor these much on our side.
:andi, have these clang-trunk builds/tests been useful for anything in the time that they've been running (~3 years now)? I'm wondering if we could turn them off.
![]() |
||
Comment 9•8 months ago
|
||
(In reply to Greg Mierzwinski [:sparky] from comment #8)
:andi, have these clang-trunk builds/tests been useful for anything in the time that they've been running (~3 years now)? I'm wondering if we could turn them off.
Comment 10•8 months ago
•
|
||
(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #3)
things to figure out:
I can answer these.
- why did these just start running (june 19th was the last run, then a break for a week?)
Because the build they depend on, build-linux64-plain-clang-trunk/opt, didn't happen because of bug 1903956.
- why do we run perf tests on here, but no unit tests?
Because the goal was to test the evolution of performance as the clang/LLVM trunk progresses.
- can we add to autoland and bisect down?
Backfills on these jobs are meaningless because they always pull the latest clang/LLVM trunk. And here, the bustage comes from a change there, that hasn't been reverted yet. Essentially, this is bug 1906026.
(In reply to Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout) from comment #9)
:andi, have these clang-trunk builds/tests been useful for anything in the time that they've been running (~3 years now)? I'm wondering if we could turn them off.
I'm not sure they have. I don't even know if perf sheriffs are looking at regressions coming from those jobs, so the question would more be for them. That being said, since bug 1790918 and bug 1791454, we have these perf jobs running on shippable builds (which would be more representative) on the toolchains project branch, so the jobs on central are kind of redundant, although it could be argued they have some level of usefulness (first and foremost, being more regular, except when their upstream tasks are busted).
Comment hidden (Intermittent Failures Robot) |
Comment 12•8 months ago
|
||
(In reply to Mike Hommey [:glandium] from comment #10)
I'm not sure they have. I don't even know if perf sheriffs are looking at regressions coming from those jobs, so the question would more be for them. That being said, since bug 1790918 and bug 1791454, we have these perf jobs running on shippable builds (which would be more representative) on the toolchains project branch, so the jobs on central are kind of redundant, although it could be argued they have some level of usefulness (first and foremost, being more regular, except when their upstream tasks are busted).
These don't produce alerts that the sheriffs would monitor since they don't run on autoland (where our regression detection is running). Could we disable the tests in either toolchains or m-c (preferably m-c)? I would suggest we disable them in both branches unless there are plans to setup some sort of manual monitoring for them or if this already exists.
Comment 13•8 months ago
|
||
It's also running on beta, which is presumably tracked(?).
Comment hidden (Intermittent Failures Robot) |
Comment 15•8 months ago
|
||
The severity field is not set for this bug.
:jmaher, could you have a look please?
For more information, please visit BugBot documentation.
Comment 16•8 months ago
|
||
it seems like these are not monitored and are broken; we should disable these and if they are fixed and running reliable again and there is a desire to make these useful, they can be turned back on.
:sparky, are you up for making the patch to turn these off?
Comment 17•8 months ago
•
|
||
Comment 18•8 months ago
|
||
Great that they're not broken anymore! I think there's still a question about if we should keep running them regardless of failures.
(In reply to Mike Hommey [:glandium] from comment #13)
It's also running on beta, which is presumably tracked(?).
I don't see them running on mozilla-beta (I checked July+some of June). If they run very infrequently then it will take a long time before any alerts get generated (we need 12 data points before/after a change to trigger an alert).
The load from these tests isn't very much atm with only m-c tests, but it still seems wasteful if we don't use the data for anything. :glandium, if you want to monitor the results from these tests could you setup a dashboard for yourself in redash and post a link to it here? Alternatively, we can disable these.
Comment 19•8 months ago
|
||
I guess disable them, but that will have the side effect of disabling the build-linux64-plain-clang-trunk/opt job too, which is useful.
Updated•7 months ago
|
Description
•