Closed Bug 478165 Opened 15 years ago Closed 6 years ago

TinderboxPrint'ing both FAIL and timeout :-/

Categories

(Testing :: General, defect, P5)

defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: sgautherie, Unassigned)

References

()

Details

(Whiteboard: [unittest])

Attachments

(1 file)

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1234413647.1234416025.17818.gz
Linux mozilla-central unit test on 2009/02/11 20:40:47
{
TinderboxPrint: mochitest<br/><em class="testfail">FAIL</em>
NEXT ERROR buildbot.slave.commands.TimeoutError: command timed out: 300 seconds without output, killing pid 868
TinderboxPrint: mochitest <em class="testfail">timeout</em><br/>
}

1) (nit)

Based on the other TinderboxPrints format, the timeout |<br/>| should move and replace the space.

2) (core issue)

The first TinderboxPrint comes from MozillaMochitest.createSummary(),
and was added by
http://hg.mozilla.org/build/buildbotcustom/rev/edad22775516
(but "this" code already existed in the other tests ... so all tests should be affected.)

The second TinderboxPrint comes from ShellCommandReportTimeout.evaluateCommand().

Trouble is we should have one "TinderboxPrint: mochitest[...]" only.
(In this case, I think we should TinderboxPrint the timeout report only.)
Isn't a timeout a failure?
Sure, but a "FAIL" is not a "timeout", etc.
Not really wrong, but quite confusing yet, I say.
Summary: TinderboxPrint'ing both FAIL and timeout now :-/ → TinderboxPrint'ing both FAIL and timeout :-/
Not every FAIL is a timeout, no, but every timeout *is* a FAIL. Have you seen a case where it prints timeout when it's not really a timeout?
(In reply to comment #3)
> Not every FAIL is a timeout, no, but every timeout *is* a FAIL.

Obviously ! (Then what ?)

> Have you seen a case where it prints timeout when it's not really a timeout?

Not that I am aware of. (Should I ?)
(In reply to comment #4)
> > Have you seen a case where it prints timeout when it's not really a timeout?
> 
> Not that I am aware of. (Should I ?)

What's the bug here the? If timeout and FAIL are printed when we timeout, and only FAIL is printed on regular failures, I don't see a problem.
Current display is
{
mochitest
FAIL
mochitest timeout

}

Before looking closely, it looks like 2 tests suites had a failure, not 1 only.

Reporting 2 keywords looks like there would have been 2 failures, not 1 only.
This is different than getting "mochitest FAIL LEAK" (= a failure which caused a leak) for example.

The current report is not technically wrong,
but sure is not user friendly.
OK, are you going to take this on?
Ftb, I would need someone to tell me what the best way to fix this is.
Currently, I plan to work on bug 473259 first.
Moving to future for now.
Component: Release Engineering → Release Engineering: Future
There's a worse case too:

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1234793738.1234798260.27879.gz&fulltext=1
Linux mozilla-central unit test on 2009/02/16 06:15:38
{
TinderboxPrint: TUnit<br/>492/0
NEXT ERROR buildbot.slave.commands.TimeoutError: command timed out: 300 seconds without output, killing pid 31120
TinderboxPrint: check <em class="testfail">timeout</em><br/>
}

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1234793739.1234798354.28329.gz&fulltext=1
WINNT 5.2 mozilla-central unit test on 2009/02/16 06:15:39
{
TinderboxPrint: TUnit<br/>530/0
buildbot.slave.commands.TimeoutError: command timed out: 300 seconds without output
TinderboxPrint: check <em class="testfail">timeout</em><br/>
}

http://tinderbox.mozilla.org/showlog.cgi?log=Thunderbird/1234793858.1234797392.23947.gz&fulltext=1
Linux comm-central check on 2009/02/16 06:17:38
{
TinderboxPrint: TUnit<br/>375/0
buildbot.slave.commands.TimeoutError: command timed out: 1800 seconds without output, killing pid 31324
TinderboxPrint: check <em class="testfail">timeout</em><br/>
}

1)
'TUnit'/'check' mismatch, which *is* confusing, as it *does* looks like 2 different tests suites.

2)
1st line seems to report success and 2nd failure: can't be more confusing than that...
Fixes comment 0 point 1 and comment 10 point 1.
(untested, but obvious)
Attachment #362591 - Flags: review?
Attachment #362591 - Flags: review? → review?(ccooper)
Attachment #362591 - Flags: review?(ccooper) review?(ccooper)
Attachment #362591 - Flags: review+
Attachment #362591 - Flags: checked‑in+
Attachment #362591 - Flags: checked‑in+ review+
Comment on attachment 362591 [details] [diff] [review]
(Av1) Fix timeout error format and unify test names
[Checkin: Comment 12]

changeset:   192:4df7f3be22bd
Comment 10 example with Mochitest:
{
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1237951288.1237963779.7861.gz
Linux comm-central dep unit test on 2009/03/24 20:21:28

mochitest
35015/9/1062
mochitest timeout
}

Note that mochitest total number should be more like 75k than 35k.
It looks like:
1) the test harness aborted the test suite with |Too many test timeouts, giving up.|.
2) then printed the figures it had so far, then |SimpleTest FINISHED|.
3) then buildbot detected an additional timeout ... as if runtests.py had failed to actually exit ... which reminds of
{
http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/runtests.py.in

474   status = automation.runApp(testURL, browserEnv, options.app,

485   server.stop()

487   automation.processLeakLog(LEAK_REPORT_FILE, options.leakThreshold)

492   # hanging due to non-halting threads is no fun; assume we hit the errors we
493   # were going to hit already and exit with a success code
494   sys.exit(0)
}
Note that the leak report is missing, as if server.stop() call never returned.

To summarize, I see 3 issues:
1- this bug, TinderboxPrinting both the figures and timeout (separately).
2- maybe the not-run tests number should be added to the ToDo (or Fail) number by the test harness, or something.
3- the server (thread) not stoppping.
Mass move of bugs from Release Engineering:Future -> Release Engineering. See
http://coop.deadsquid.com/2010/02/kiss-the-future-goodbye/ for more details.
Component: Release Engineering: Future → Release Engineering
Priority: -- → P3
Priority: P3 → P5
Serge: are you still planning on working on this? Still working on bug 473259 first?

No one else is clamoring for a fix here, so it's all on you. I'm happy to do reviews though.
Attachment #362591 - Attachment description: (Av1) Fix timeout error format and unify test names → (Av1) Fix timeout error format and unify test names [Checkin: Comment 12]
(In reply to comment #15)
> Serge: are you still planning on working on this? Still working on bug 473259
> first?

I'm still interested in all these bugs, but I +/- gave up a year ago.
Very recently, I got some progress on bug 427500 (as an example), so I'm not totally stalled anymore, though I've also updated real life to higher priority...
Whiteboard: [unittest]
Assignee: nobody → ccooper
(In reply to comment #13)
> To summarize, I see 3 issues:
> 1- this bug, TinderboxPrinting both the figures and timeout (separately).
> 2- maybe the not-run tests number should be added to the ToDo (or Fail) number
> by the test harness, or something.
> 3- the server (thread) not stoppping.

Based on this, I'm going to file a new bug for #1, and move this bug over to the Testing product to sort out #2 and #3.
Assignee: coop → nobody
Component: Release Engineering → General
Flags: checked-in+
Product: mozilla.org → Testing
QA Contact: release → general
Version: other → Trunk
Mass closing bugs with no activity in 2+ years. If this bug is important to you, please re-open.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: