Closed Bug 932159 Opened 12 years ago Closed 11 years ago

Green up ASan browser-chrome

Tracking

()

Status:

RESOLVED WORKSFORME

People

(Reporter: philor, Assigned: decoder)

References

(Blocks 1 open bug)

Details

(Keywords: sec-want, Whiteboard: [leave open])

Attachments

(1 file)

asan-bc-oom.patch 12 years ago Christian Holler (:decoder) 1.21 KB, patch	decoder : review+	Details \| Diff \| Splinter Review

Phil Ringnalda (:philor)

Reporter

Description

•

12 years ago

We tried to get it to run to completion (at least sometimes) with https://hg.mozilla.org/mozilla-central/rev/e7439c6c1e81 disabling the frequently failing browser_thumbnails_background.js, but that didn't particularly green it up. Based on the things that are failing (primarily a whole lot of toolkit/mozapps/extensions/ stuff), and the way they've failed before, I sort of vaguely suspect OOM. One possible savior is bug 819963 - we're actually very close to the time when Mac debug browser-chrome will start turning red by taking over 9000 seconds (over 9000!) and being killed by buildbot, which we'll probably fix by ramming that through rather than fixing it. Meantime, ASan b-c has gotten so horrible (the last time it was actually green on inbound was 9 hours ago) that I'm hiding it, so we'll stop expecting it to run and only I'll see it and star it as this, rather than continue to bombard people, mostly addonsmgr people, with failure bugspam that they can only fix by fixing the fact that we were already teetering on the OOM cliff before they even started running.

Phil Ringnalda (:philor)

Reporter

Comment 1

•

12 years ago

AN CLUE: while I was typing up the bureaucratic bug in the Tinderboxpushlog component about "Unhide when it gets better," I was about to type the "hidden on..." line including mozilla-aurora, since we merged to there from a horribly broken tree this morning, when I realized that I haven't been starring the ASan b-c failures there since the merge. Perfectly green, four runs in a row, which is three more in a row than we've managed in days, if not weeks, on the trunk. So, does --disable-profiling or one of the other unlikely to have this effect changes between trunk and aurora cause this problem to disappear?

Phil Ringnalda (:philor)

Reporter

Updated

•

12 years ago

Blocks: 932164

Phil Ringnalda (:philor)

Reporter

Comment 2

•

12 years ago

To bolster my OOM claim, https://tbpl.mozilla.org/php/getParsedLog.php?id=29817157&full=1&branch=mozilla-inbound#error0 is ASan say that it's OOM.

Phil Ringnalda (:philor)

Reporter

Comment 3

•

12 years ago

Maybe not profiling, since https://tbpl.mozilla.org/?showall=1&tree=Profiling builds with --disable-profiling, and while it's a bit better than inbound/central/fx-team, it's not quite perfect, nor as good as aurora.

Christian Holler (:decoder)

Assignee

Comment 4

•

12 years ago

Do we have a regression range for this? When we started with ASan b-c, it was perfectly green all the time. So if you're saying you're having trouble for weeks now, why didn't we investigate earlier? Is there a particular testing going OOM, or is it just randomly? I guess it's either a test or an option that makes the difference between aurora and central, but a regression range would really be good.

Ed Morley [:emorley]

Comment 5

•

12 years ago

It seems to have slowly deteriorated. One of the problems is that ASan browser-chrome is (obviously) only 1 of many browser-chrome jobs run per push, so even a drastic deterioration in it's failure rate gets averaged out by all the rest, when jumping from tree to tree starring on only-unstarred mode.

Ed Morley [:emorley]

Comment 6

•

12 years ago

(The sheriffs are more than eager to back things out if they have been spotted to increase a failure rate, and do so quite regularly, though there are occasions when it increased so gradually this wasn't possible)

Christian Holler (:decoder)

Assignee

Comment 7

•

12 years ago

Assuming that memory-pressure increased in general, here's a try run for a smaller quarantine size on 2-4 GB memory builders: https://tbpl.mozilla.org/?tree=Try&rev=758951544a82 I am still curious though what makes difference between aurora and inbound in this case.

Christian Holler (:decoder)

Assignee

Comment 8

•

12 years ago

Assigning to myself. Philor: I was trying to figure out the influence of profiling, but as far as I can see, ASan doesn't build with --enable-profiling (the regular nightly config has it, the nightly-asan config does not). Did we make this (or any other config change) somewhere else, outside the of the mozconfigs? That said, mochitest-bc was always close to the memory limits, could be that some changes just increased memory usage by a little, causing random failures. The try push I made should help in that case.

Assignee: nobody → choller

Blocks: asan-maintenance

Keywords: sec-want

Phil Ringnalda (:philor)

Reporter

Comment 9

•

12 years ago

If you want a clearer example of why it's hard to see things exactly like this when they happen, take a look at bug 920976. Pick the instance that (in hindsight) seems to be the start, either the first one yesterday or the double-starred one on the 21st, open the log, and click the "push abc123" link at the top to get to tbpl for that push, then click the down-arrow. Was it actually because of sunfish's self-reviewed followup to bug 925729? Was it something in a merge from fx-team that only became broken when it met something else on inbound? Was it one of the pushes that hardly ran any tests at all?

Phil Ringnalda (:philor)

Reporter

Comment 10

•

12 years ago

Oops, bug 920978, which is a rather neat example, not 976 whatever it is.

Christian Holler (:decoder)

Assignee

Comment 11

•

12 years ago

Attached patch asan-bc-oom.patch — Details — Splinter Review

Decrease the quarantine_size for mochitests running ASan by 20% to solve the memory issues on mochitest-bc.

Attachment #824129 - Flags: review?(ted)

Christian Holler (:decoder)

Assignee

Comment 12

•

12 years ago

Comment on attachment 824129 [details] [diff] [review] asan-bc-oom.patch rs=ted on IRC.

Attachment #824129 - Flags: review?(ted) → review+

Christian Holler (:decoder)

Assignee

Comment 13

•

12 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/26a116feaf60

Phil Ringnalda (:philor)

Reporter

Comment 14

•

12 years ago

That's done a great job of exposing https://tbpl.mozilla.org/php/getParsedLog.php?id=29854349&tree=Mozilla-Inbound, which must have landed sometime between the parent of your try push and yesterday afternoon on inbound, where it's visible if you know to look past all the other failures.

Whiteboard: [leave open]

Ed Morley [:emorley]

Updated

•

12 years ago

Blocks: 932781

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 15

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/26a116feaf60

Ryan VanderMeulen [:RyanVM]

Comment 16

•

12 years ago

Should we try to back this out now?

Christian Holler (:decoder)

Assignee

Comment 17

•

12 years ago

Fwiw, I've checked the last pushes to mozilla-inbound and mochitest-bc was always green. Maybe the failure in comment 11 is a rare/unrelated intermittent? I suggest we unhide asan mochitest-bc again given that it's green now. The only thing we should maybe check before doing so is if the OOM patch I made is still necessary, as Ryan pointed out. I'm currently doing a try push with the backout of my oom memory patch to check that.

Phil Ringnalda (:philor)

Reporter

Comment 18

•

12 years ago

It's not at all rare, it's just not quite permaorange. But apparently people think that since ASan bc is hidden for one bustage followed by another bustage, that makes it fine to break it again, so I'm unhiding it to stop the pile-on.

Phil Ringnalda (:philor)

Reporter

Updated

•

12 years ago

Depends on: 934641

Phil Ringnalda (:philor)

Reporter

Updated

•

11 years ago

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → WORKSFORME

You need to log in before you can comment on or make changes to this bug.