Closed Bug 876159 Opened 11 years ago Closed 11 years ago

Test runs which fail and then exceed the maximum log size are marked as passing

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: philor, Unassigned)

References

Details

Attachments

(5 files, 1 obsolete file)

detect truncation -> burn 11 years ago Aki Sasaki (not active) 1.16 KB, patch	philor : review+ mozilla : checked-in+	Details \| Diff \| Splinter Review
[mozharness] set self.return_code 11 years ago Aki Sasaki (not active) 7.59 KB, patch		Details \| Diff \| Splinter Review
[mozharness] set self.return_code, lower touch version 11 years ago Aki Sasaki (not active) 2.21 KB, patch	philor : review+ mozilla : checked-in+	Details \| Diff \| Splinter Review
[custom] rc_eval_func instead of regex_log_evaluator 11 years ago Aki Sasaki (not active) 2.35 KB, patch	philor : review+ mozilla : checked-in+	Details \| Diff \| Splinter Review
[mozharness] set failure if we surpass max log size 11 years ago Aki Sasaki (not active) 5.66 KB, patch	philor : review+ mozilla : checked-in+	Details \| Diff \| Splinter Review
move worst_level() to LogMixin, use it in buildbot_status() 11 years ago Aki Sasaki (not active) 11.08 KB, patch	Callek : review+ mozilla : checked-in+	Details \| Diff \| Splinter Review

Phil Ringnalda (:philor)

Reporter

Description

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=23400763&tree=Mozilla-Inbound is a debug Ubuntu crashtest run, which exceeded the maximum log size, but is marked as green.

Every tree is closed.

Phil Ringnalda (:philor)

Reporter

Comment 1

•

11 years ago

Those crashtests may well be passing, the summary comes from https://tbpl.mozilla.org/php/getParsedLog.php?id=23402698&tree=Try where Ms2ger spotted it, because his try push did break mochitest-1, but the debug ones which exceeded the maximum log size were marked as green.

Phil Ringnalda (:philor)

Reporter

Comment 2

•

11 years ago

And we can easily get the linux crashtests down below the log limit, should just be a matter of 'hg backout -r 6115fce2399d'. I'm only holding off on that because we might need the permafailgreen to test fixing detection of it.

Doesn't solve all our undetected problems, though, because https://tbpl.mozilla.org/php/getParsedLog.php?id=23401583&tree=Mozilla-Inbound is a Win7 debug mochitest-2 that OOMs and overflows the log (intermittently).

Andrew McCreight [:mccr8]

Comment 3

•

11 years ago

r=me to comment out that line with a reference to this bug

Andrew McCreight [:mccr8]

Comment 4

•

11 years ago

(xpcom/base/nsCycleCollector.cpp, line 1890)

Andrew McCreight [:mccr8]

Comment 5

•

11 years ago

Or you can just delete that warning, rather than commenting it out.  I was planning on removing it next week anyways.

Phil Ringnalda (:philor)

Reporter

Comment 6

•

11 years ago

My possibly-flawed understand of how we used to catch these is that http://mxr.mozilla.org/build/source/buildbotcustom/steps/unittest.py#230 insisted on seeing the "Failed: 0" line, and if it didn't see it, then the run failed.

http://mxr.mozilla.org/build/source/mozharness/mozharness/mozilla/testing/unittest.py#120 looks to be the mozharness equivalent, where it thinks that in that case fail_count will still be the -1 it was inited as, but apparently that doesn't work: both truncated logs which have an error in the non-truncated part and should have set fail_count to > 0, and truncated logs which don't have an error in the non-truncated part and should have left it at -1 wind up with SUCCESS.

Bill Gianopoulos [:WG9s]

Comment 7

•

11 years ago

Well, there are other issues here.  We have one case where even in the truncated log, there are test failures yet it is green so it seems that the log being truncated bypasses running through the parser at all and sets the build green.  I think bypassing the parser altogether is fine, but if the log is so much bigger than expected that it has to be truncated, that condition alone should be sufficient to turn the build red.

Aki Sasaki (not active)

Comment 8

•

11 years ago

Attached patch detect truncation -> burn — Details — Splinter Review

Not sure if this will work, but it's worth a shot.

Attachment #754171 - Flags: review?(philringnalda)

Phil Ringnalda (:philor)

Reporter

Comment 9

•

11 years ago

Comment on attachment 754171 [details] [diff] [review]
detect truncation -> burn

I'm pretty sure it won't work, because the message is a buildbot HEADER rather than an ANYTHINGYOUCANACTUALLYSEE, but Callek thought it was possible that it was visible to log parsing despite that, so if you're willing to do a reconfig I'm more than willing to trigger a crashtest that will overflow to see.

Attachment #754171 - Flags: review?(philringnalda) → review+

Aki Sasaki (not active)

Comment 10

•

11 years ago

The other approach we can take is to use rc_eval_func instead of regex_eval_func, and change the scripts to set a return status as opposed to spitting out a string.

That approach would take a lot more changes, so I'm going to reconfig to see if this works first.

Aki Sasaki (not active)

Comment 11

•

11 years ago

Comment on attachment 754171 [details] [diff] [review]
detect truncation -> burn

http://hg.mozilla.org/build/buildbotcustom/rev/f05c900f0d29

Attachment #754171 - Flags: checked-in+

Aki Sasaki (not active)

Comment 12

•

11 years ago

Also, we've been talking about uploading logs, configs, etc. from test jobs for a while now.  That would allow for uploading the un-truncated info log, the raw log and other log level logs, the localconfig.json, screenshots, or anything else we put in the upload dir for uploading.  There would still need to be something telling buildbot what the status is, though.

Phil Ringnalda (:philor)

Reporter

Comment 13

•

11 years ago

(In reply to Andrew McCreight [:mccr8] from comment #3)
> r=me to comment out that line with a reference to this bug

Not enough, https://tbpl.mozilla.org/php/getParsedLog.php?id=23409164&full=1&branch=try is the same OOM without the CC warning, still overflowing. Kill the nsDeque OOM warning? Kill imptests? I'll kill imptests if you tell me to, just tell me to, I don't mind, I'll kill them, say the word.

Aki Sasaki (not active)

Comment 14

•

11 years ago

Attached patch [mozharness] set self.return_code (obsolete) — Details — Splinter Review

Untested, but should work in theory.
We'd need to change the regex_log_evaluator() to rc_eval_func() in buildbotcustom.

Aki Sasaki (not active)

Comment 15

•

11 years ago

Attached patch [mozharness] set self.return_code, lower touch version — Details — Splinter Review

Attachment #754179 - Attachment is obsolete: true

Aki Sasaki (not active)

Comment 16

•

11 years ago

Attached patch [custom] rc_eval_func instead of regex_log_evaluator — Details — Splinter Review

Aki Sasaki (not active)

Updated

•

11 years ago

Attachment #754181 - Flags: review?

Justin Wood (:Callek)

Comment 17

•

11 years ago

Comment on attachment 754184 [details] [diff] [review]
[custom] rc_eval_func instead of regex_log_evaluator

5.. manually canceled?

Aki Sasaki (not active)

Updated

•

11 years ago

Attachment #754184 - Flags: review?

Aki Sasaki (not active)

Comment 18

•

11 years ago

(In reply to Justin Wood (:Callek) from comment #17)
> Comment on attachment 754184 [details] [diff] [review]
> [custom] rc_eval_func instead of regex_log_evaluator
> 
> 5.. manually canceled?

How would we detect that in mozharness?

Masatoshi Kimura [:emk]

Comment 19

•

11 years ago

Comment on attachment 754181 [details] [diff] [review]
[mozharness] set self.return_code, lower touch version

You need to assign a reviewer.

Phil Ringnalda (:philor)

Reporter

Updated

•

11 years ago

Attachment #754181 - Flags: review? → review+

Phil Ringnalda (:philor)

Reporter

Comment 20

•

11 years ago

Comment on attachment 754184 [details] [diff] [review]
[custom] rc_eval_func instead of regex_log_evaluator

"Dunno": RETRY always scares me because there are so many things I dunno, but there's no better time than now to risk infinite retries :)

Attachment #754184 - Flags: review? → review+

Aki Sasaki (not active)

Comment 21

•

11 years ago

Comment on attachment 754181 [details] [diff] [review]
[mozharness] set self.return_code, lower touch version

https://hg.mozilla.org/build/mozharness/rev/d90aad39ac1f

Landed on default.
I'll merge to production once the reconfig's done.

Attachment #754181 - Flags: checked-in+

Aki Sasaki (not active)

Comment 22

•

11 years ago

Comment on attachment 754184 [details] [diff] [review]
[custom] rc_eval_func instead of regex_log_evaluator

http://hg.mozilla.org/build/buildbotcustom/rev/117b78891fc5

Merged to production-0.8, starting reconfig.

Attachment #754184 - Flags: checked-in+

Phil Ringnalda (:philor)

Reporter

Comment 23

•

11 years ago

The Win debug OOM looks like it's maybe 8% based on my try run, so it doesn't block reopening like the crashtest does, but I'll be coming after it with a machete before long.

Aki Sasaki (not active)

Comment 24

•

11 years ago

Reconfig done, mozharness patch transplanted to production.

Aki Sasaki (not active)

Comment 25

•

11 years ago

Attached patch [mozharness] set failure if we surpass max log size — Details — Splinter Review

with unit tests that pass.

Attachment #754189 - Flags: review?(philringnalda)

Phil Ringnalda (:philor)

Reporter

Comment 26

•

11 years ago

Comment on attachment 754189 [details] [diff] [review]
[mozharness] set failure if we surpass max log size

>+def cleanup():
>+    gc.collect()
>+    # I'm using MercurialVCS here because that gives me access to
>+    #
>+    c = CleanupObj()

This is not going to be a surprise, but I've got no idea what that comment is saying.

Attachment #754189 - Flags: review?(philringnalda) → review+

Aki Sasaki (not active)

Comment 27

•

11 years ago

Comment on attachment 754189 [details] [diff] [review]
[mozharness] set failure if we surpass max log size

Landed to default, transplanted to production.  And yeah, that comment was left over from a different file I copied from (cleaned up now).
http://hg.mozilla.org/build/mozharness/rev/7416a68e544b

Attachment #754189 - Flags: checked-in+

Andrew McCreight [:mccr8]

Comment 28

•

11 years ago

(In reply to Phil Ringnalda (:philor) from comment #13)
> Not enough,
> https://tbpl.mozilla.org/php/getParsedLog.php?id=23409164&full=1&branch=try
> is the same OOM without the CC warning, still overflowing. Kill the nsDeque
> OOM warning? Kill imptests? I'll kill imptests if you tell me to, just tell
> me to, I don't mind, I'll kill them, say the word.

Yeah, I think killing the nsDeque OOM warning is good as a short term measure to prevent log badness.  I should be able to add it back later this week, when I make it so the CC doesn't blast it out a million times when things fall apart.

It is just papering over extreme badness in that test (bug 875585), but you are going to need to get somebody else's approval for disabling it. ;)  Ms2ger should have a better idea who might have an opinion.

Andrew McCreight [:mccr8]

Updated

•

11 years ago

Depends on: 875585

Phil Ringnalda (:philor)

Reporter

Comment 29

•

11 years ago

Never been so happy to see a red job; trees reopened, *thank you Aki*.

Severity: blocker → normal

Phil Ringnalda (:philor)

Reporter

Comment 30

•

11 years ago

Filed bug 876545 on dealing with the way that debug mochitest-1 is so close to 50MB that a single timeout and screenshot puts it over.

Aki Sasaki (not active)

Comment 31

•

11 years ago

Attached patch move worst_level() to LogMixin, use it in buildbot_status() — Details — Splinter Review

Attachment #756301 - Flags: review?(bugspam.Callek)

Justin Wood (:Callek)

Comment 32

•

11 years ago

Comment on attachment 756301 [details] [diff] [review]
move worst_level() to LogMixin, use it in buildbot_status()

Review of attachment 756301 [details] [diff] [review]:
-----------------------------------------------------------------

This looks ok to me

::: mozharness/mozilla/buildbot.py
@@ +83,5 @@
>              if not level:
>                  level = TBPL_STATUS_DICT[tbpl_status]
> +            self.worst_buildbot_status = self.worst_level(tbpl_status, self.worst_buildbot_status, TBPL_STATUS_DICT.keys())
> +            if self.worst_buildbot_status != tbpl_status:
> +                self.info("Current worst status %s is worse; keeping it." % self.worst_buildbot_status)

not a fan of how this english reads, but its a very minor bikeshed nit with no new shed color in mind, so no worry.

Attachment #756301 - Flags: review?(bugspam.Callek) → review+

Aki Sasaki (not active)

Comment 33

•

11 years ago

Comment on attachment 756301 [details] [diff] [review]
move worst_level() to LogMixin, use it in buildbot_status()

http://hg.mozilla.org/build/mozharness/rev/43e1d7a78e96 , transplanted to production.

Attachment #756301 - Flags: checked-in+

Phil Ringnalda (:philor)

Reporter

Updated

•

11 years ago

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Assignee

Updated

•

11 years ago

Product: mozilla.org → Release Engineering

Nobody; OK to take it and work on it

Assignee

Updated

•

6 years ago

Component: General Automation → General

You need to log in before you can comment on or make changes to this bug.