Green up ASan browser-chrome

RESOLVED WORKSFORME

Status

()

Core
General
RESOLVED WORKSFORME
5 years ago
4 years ago

People

(Reporter: philor, Assigned: decoder)

Tracking

(Blocks: 1 bug, {sec-want})

Trunk
x86_64
Linux
sec-want
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [leave open])

Attachments

(1 attachment)

(Reporter)

Description

5 years ago
We tried to get it to run to completion (at least sometimes) with https://hg.mozilla.org/mozilla-central/rev/e7439c6c1e81 disabling the frequently failing browser_thumbnails_background.js, but that didn't particularly green it up.

Based on the things that are failing (primarily a whole lot of toolkit/mozapps/extensions/ stuff), and the way they've failed before, I sort of vaguely suspect OOM.

One possible savior is bug 819963 - we're actually very close to the time when Mac debug browser-chrome will start turning red by taking over 9000 seconds (over 9000!) and being killed by buildbot, which we'll probably fix by ramming that through rather than fixing it.

Meantime, ASan b-c has gotten so horrible (the last time it was actually green on inbound was 9 hours ago) that I'm hiding it, so we'll stop expecting it to run and only I'll see it and star it as this, rather than continue to bombard people, mostly addonsmgr people, with failure bugspam that they can only fix by fixing the fact that we were already teetering on the OOM cliff before they even started running.
(Reporter)

Comment 1

5 years ago
AN CLUE: while I was typing up the bureaucratic bug in the Tinderboxpushlog component about "Unhide when it gets better," I was about to type the "hidden on..." line including mozilla-aurora, since we merged to there from a horribly broken tree this morning, when I realized that I haven't been starring the ASan b-c failures there since the merge. Perfectly green, four runs in a row, which is three more in a row than we've managed in days, if not weeks, on the trunk.

So, does --disable-profiling or one of the other unlikely to have this effect changes between trunk and aurora cause this problem to disappear?
(Reporter)

Updated

5 years ago
Blocks: 932164
(Reporter)

Comment 3

5 years ago
Maybe not profiling, since https://tbpl.mozilla.org/?showall=1&tree=Profiling builds with --disable-profiling, and while it's a bit better than inbound/central/fx-team, it's not quite perfect, nor as good as aurora.
(Assignee)

Comment 4

5 years ago
Do we have a regression range for this? When we started with ASan b-c, it was perfectly green all the time. So if you're saying you're having trouble for weeks now, why didn't we investigate earlier?

Is there a particular testing going OOM, or is it just randomly? I guess it's either a test or an option that makes the difference between aurora and central, but a regression range would really be good.

Comment 5

5 years ago
It seems to have slowly deteriorated. One of the problems is that ASan browser-chrome is (obviously) only 1 of many browser-chrome jobs run per push, so even a drastic deterioration in it's failure rate gets averaged out by all the rest, when jumping from tree to tree starring on only-unstarred mode.

Comment 6

5 years ago
(The sheriffs are more than eager to back things out if they have been spotted to increase a failure rate, and do so quite regularly, though there are occasions when it increased so gradually this wasn't possible)
(Assignee)

Comment 7

5 years ago
Assuming that memory-pressure increased in general, here's a try run for a smaller quarantine size on 2-4 GB memory builders: https://tbpl.mozilla.org/?tree=Try&rev=758951544a82

I am still curious though what makes difference between aurora and inbound in this case.
(Assignee)

Comment 8

5 years ago
Assigning to myself.

Philor: I was trying to figure out the influence of profiling, but as far as I can see, ASan doesn't build with --enable-profiling (the regular nightly config has it, the nightly-asan config does not). Did we make this (or any other config change) somewhere else, outside the of the mozconfigs?

That said, mochitest-bc was always close to the memory limits, could be that some changes just increased memory usage by a little, causing random failures. The try push I made should help in that case.
Assignee: nobody → choller
Blocks: 863846
Keywords: sec-want
(Reporter)

Comment 9

5 years ago
If you want a clearer example of why it's hard to see things exactly like this when they happen, take a look at bug 920976. Pick the instance that (in hindsight) seems to be the start, either the first one yesterday or the double-starred one on the 21st, open the log, and click the "push abc123" link at the top to get to tbpl for that push, then click the down-arrow. Was it actually because of sunfish's self-reviewed followup to bug 925729? Was it something in a merge from fx-team that only became broken when it met something else on inbound? Was it one of the pushes that hardly ran any tests at all?
(Reporter)

Comment 10

5 years ago
Oops, bug 920978, which is a rather neat example, not 976 whatever it is.
(Assignee)

Comment 11

5 years ago
Created attachment 824129 [details] [diff] [review]
asan-bc-oom.patch

Decrease the quarantine_size for mochitests running ASan by 20% to solve the memory issues on mochitest-bc.
Attachment #824129 - Flags: review?(ted)
(Assignee)

Comment 12

5 years ago
Comment on attachment 824129 [details] [diff] [review]
asan-bc-oom.patch

rs=ted on IRC.
Attachment #824129 - Flags: review?(ted) → review+
(Reporter)

Comment 14

5 years ago
That's done a great job of exposing https://tbpl.mozilla.org/php/getParsedLog.php?id=29854349&tree=Mozilla-Inbound, which must have landed sometime between the parent of your try push and yesterday afternoon on inbound, where it's visible if you know to look past all the other failures.
Whiteboard: [leave open]

Updated

5 years ago
Blocks: 932781
Should we try to back this out now?
(Assignee)

Comment 17

5 years ago
Fwiw, I've checked the last pushes to mozilla-inbound and mochitest-bc was always green. Maybe the failure in comment 11 is a rare/unrelated intermittent? I suggest we unhide asan mochitest-bc again given that it's green now. The only thing we should maybe check before doing so is if the OOM patch I made is still necessary, as Ryan pointed out.

I'm currently doing a try push with the backout of my oom memory patch to check that.
(Reporter)

Comment 18

5 years ago
It's not at all rare, it's just not quite permaorange. But apparently people think that since ASan bc is hidden for one bustage followed by another bustage, that makes it fine to break it again, so I'm unhiding it to stop the pile-on.
(Reporter)

Updated

5 years ago
Depends on: 934641
(Reporter)

Updated

4 years ago
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.