1905006 - Perma [tier 2] linux1804-64-clang-trunk-qr opt mass failures that affect awsy and btime

Iulian Moraru

Reporter

Description

•

8 months ago

•

Edited

This started happening all at once and only on linux1804-64-clang-trunk-qr opt.

Iulian Moraru

Reporter

Updated

•

8 months ago

Summary: Perma linux1804-64-clang-trunk-qr opt mass failures that affect awsy and btime → Perma [tier 2] linux1804-64-clang-trunk-qr opt mass failures that affect awsy and btime

Iulian Moraru

Reporter

Comment 1

•

8 months ago

Hi Joel! Can you please take a look at this? Maybe you can figure out what is going on here.
Thank you!

Flags: needinfo?(jmaher)

Comment hidden (Intermittent Failures Robot)

Joel Maher ( :jmaher ) (UTC -8)

Comment 3

•

8 months ago

the push prior to the awsy failures seems to be where we turned on perf tests on clang builds. I don't know why we even run perf tests there as this isn't shippable, according to perfherder this has been running on m-c for >1 year, so why did we not have data for 20+ builds on m-c?

these tasks appear to be scheduled by a cron job:
Decision Task for cron job linux64-clang-trunk-perf

and even on previous commits this would run and generate target tasks, but those tasks wouldn't show up.

I am not aware of any other testing done on linux clang builds, so probably the builds are not functional in some way?

things to figure out:

why did these just start running (june 19th was the last run, then a break for a week?)
why do we run perf tests on here, but no unit tests?
can we add to autoland and bisect down?

I will start with the perf tooling team as we could understand the why and maybe be aware of more history of on/off tests or why they just started back up.

Flags: needinfo?(jmaher) → needinfo?(gmierz2)

Comment hidden (Intermittent Failures Robot)

Greg Mierzwinski [:sparky]

Comment 8

•

8 months ago

:jmaher, I'm not sure why they broke like that, or why the scheduling is odd. We don't monitor these much on our side.

:andi, have these clang-trunk builds/tests been useful for anything in the time that they've been running (~3 years now)? I'm wondering if we could turn them off.

Flags: needinfo?(gmierz2) → needinfo?(bpostelnicu)

Comment 9

•

8 months ago

(In reply to Greg Mierzwinski [:sparky] from comment #8)

:andi, have these clang-trunk builds/tests been useful for anything in the time that they've been running (~3 years now)? I'm wondering if we could turn them off.

Flags: needinfo?(bpostelnicu) → needinfo?(mh+mozilla)

Mike Hommey [:glandium]

Comment 10

•

8 months ago

•

Edited

(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #3)

things to figure out:

I can answer these.

why did these just start running (june 19th was the last run, then a break for a week?)

Because the build they depend on, build-linux64-plain-clang-trunk/opt, didn't happen because of bug 1903956.

why do we run perf tests on here, but no unit tests?

Because the goal was to test the evolution of performance as the clang/LLVM trunk progresses.

can we add to autoland and bisect down?

Backfills on these jobs are meaningless because they always pull the latest clang/LLVM trunk. And here, the bustage comes from a change there, that hasn't been reverted yet. Essentially, this is bug 1906026.

(In reply to Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout) from comment #9)

:andi, have these clang-trunk builds/tests been useful for anything in the time that they've been running (~3 years now)? I'm wondering if we could turn them off.

I'm not sure they have. I don't even know if perf sheriffs are looking at regressions coming from those jobs, so the question would more be for them. That being said, since bug 1790918 and bug 1791454, we have these perf jobs running on shippable builds (which would be more representative) on the toolchains project branch, so the jobs on central are kind of redundant, although it could be argued they have some level of usefulness (first and foremost, being more regular, except when their upstream tasks are busted).

Flags: needinfo?(mh+mozilla)

Comment hidden (Intermittent Failures Robot)

Greg Mierzwinski [:sparky]

Comment 12

•

8 months ago

(In reply to Mike Hommey [:glandium] from comment #10)

I'm not sure they have. I don't even know if perf sheriffs are looking at regressions coming from those jobs, so the question would more be for them. That being said, since bug 1790918 and bug 1791454, we have these perf jobs running on shippable builds (which would be more representative) on the toolchains project branch, so the jobs on central are kind of redundant, although it could be argued they have some level of usefulness (first and foremost, being more regular, except when their upstream tasks are busted).

These don't produce alerts that the sheriffs would monitor since they don't run on autoland (where our regression detection is running). Could we disable the tests in either toolchains or m-c (preferably m-c)? I would suggest we disable them in both branches unless there are plans to setup some sort of manual monitoring for them or if this already exists.

Flags: needinfo?(mh+mozilla)

Greg Mierzwinski [:sparky]

Updated

•

8 months ago

Comment 13

•

8 months ago

It's also running on beta, which is presumably tracked(?).

Flags: needinfo?(mh+mozilla)

Comment hidden (Intermittent Failures Robot)

BugBot [:suhaib / :marco/ :calixte]

Comment 15

•

8 months ago

The severity field is not set for this bug.
:jmaher, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 16

•

8 months ago

it seems like these are not monitored and are broken; we should disable these and if they are fixed and running reliable again and there is a desire to make these useful, they can be turned back on.

:sparky, are you up for making the patch to turn these off?

Severity: -- → S3

Flags: needinfo?(jmaher) → needinfo?(gmierz2)

Priority: -- → P2

Mike Hommey [:glandium]

Comment 17

•

8 months ago

•

Edited

They're not broken anymore: https://treeherder.mozilla.org/jobs?repo=mozilla-central&searchStr=trunk&revision=162448b8b7cdda02d8b081277b29e0b675567876

Greg Mierzwinski [:sparky]

Comment 18

•

8 months ago

Great that they're not broken anymore! I think there's still a question about if we should keep running them regardless of failures.

(In reply to Mike Hommey [:glandium] from comment #13)

It's also running on beta, which is presumably tracked(?).

I don't see them running on mozilla-beta (I checked July+some of June). If they run very infrequently then it will take a long time before any alerts get generated (we need 12 data points before/after a change to trigger an alert).

The load from these tests isn't very much atm with only m-c tests, but it still seems wasteful if we don't use the data for anything. :glandium, if you want to monitor the results from these tests could you setup a dashboard for yourself in redash and post a link to it here? Alternatively, we can disable these.

Flags: needinfo?(gmierz2) → needinfo?(mh+mozilla)

Mike Hommey [:glandium]

Comment 19

•

8 months ago

I guess disable them, but that will have the side effect of disabling the build-linux64-plain-clang-trunk/opt job too, which is useful.

Flags: needinfo?(mh+mozilla)

Cosmin Sabou [:CosminS]

Updated

•

7 months ago

Whiteboard: [stockwell disable-recommended] → [stockwell unknown]

Bugzilla

Perma [tier 2] linux1804-64-clang-trunk-qr opt mass failures that affect awsy and btime

Categories

(Testing :: General, defect, P2)

Tracking

(Not tracked)

People

(Reporter: imoraru, Unassigned)

References

Details

(Whiteboard: [stockwell unknown])

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Updated

Comment 13

Comment 14

Comment 15

Comment 16

Comment 17

Comment 18

Comment 19

Updated