Closed Bug 468023 Opened 16 years ago Closed 16 years ago

Synchronize RefTest results to tinderbox waterfall from (new) log

Categories

(Testing :: Reftest, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla1.9.1b3

People

(Reporter: sgautherie, Assigned: sgautherie)

References

()

Details

(Keywords: fixed1.9.1, Whiteboard: [fixed1.9.1b3: Bv2a])

Attachments

(3 files, 8 obsolete files)

2.59 KB, patch
Details | Diff | Splinter Review
8.62 KB, patch
coop
: review+
Details | Diff | Splinter Review
1.61 KB, patch
coop
: review+
Details | Diff | Splinter Review
Bug 466372 comment 1:
{{
From  Serge Gautherie   2008-11-23 08:19:53 PST

Fwiw, the figures are a little different from the (Windows SeaMonkey)
waterfall:
{
TinderboxPrint: reftest<br/>2326/0/93
}
I would guess
2326 = Pass + Fail (which don't on the tindrboxes) + LoadOnly
93 = +/- "Known Fail" (without the "Random only") + Skip
}}

***

Example of current outputs:

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1228438741.1228442157.13690.gz&fulltext=1
Linux mozilla-central moz2-linux-slave07 dep unit test on 2008/12/04 16:59:01
REFTEST INFO | Pass: 2351
REFTEST INFO | KnownFail: 98
REFTEST INFO | Random: 30
REFTEST INFO | LoadOnly: 4
REFTEST INFO | Skip: 15
TinderboxPrint: reftest<br/>2362/0/113
REFTEST INFO | LoadOnly: 795
REFTEST INFO | Skip: 2
TinderboxPrint: crashtest<br/>799/0/2

***

I don't know where/how the (3) TinderboxPrint numbers are calculated...

But I'd like to suggest to calculate them as the sum of the new items:
{
       8 +  // The errors...
       9 +  Exception: 0,
      10 +  FailedLoad: 0,
      11 +  UnexpectedFail: 0,
      12 +  UnexpectedPass: 0,

      13 +  // The successes...
      14 +  Pass: 0,
      15 +  KnownFail : 0,
      16 +  Random : 0,
      17 +  LoadOnly: 0,

      18 +  // The unknowns.
      19 +  Skip: 0,
}
We should not do it this way.

The first number (passed tests) should be the sum of the Pass and the LoadOnly.  The second number (unexpected results that turn the tree orange) should be the sum of the Exception (maybe, I'm not even sure what this is), FailedLoad, UnexpectedFail, and UnexpectedPass.  The third number (known failures) should be the some of the KnownFail, Skip, and Random.

However, I don't think there's any need to rewrite the code in question, nor do I see how such a rewrite would change the results (except possibly for the Exception category).
(In reply to comment #1)
> We should not do it this way.

Fine with me.

> However, I don't think there's any need to rewrite the code in question, nor do
> I see how such a rewrite would change the results

At least, I'm puzzled by the 4 extra crashtests reported on the waterfall :-/
Actually, it looks like the current code is broken, so maybe we should do this.  In particular, the current code does this:

REFTEST TEST-FAIL | | EXCEPTION: Error in manifest file file:///builds/slave/trunk_linux-7/build/layout/reftests/text/reftest.list line 28
REFTEST FINISHED: Slowest test took 0ms (undefined)
REFTEST INFO | Result summary:
REFTEST INFO | Exception: 1
REFTEST INFO | Total canvas count = 0
program finished with exit code 0
TinderboxPrint: reftest<br/>5/0/0

Note that it's counting the info lines as passing tests.  We're also printing TEST-FAIL instead of TEST-UNEXPECTED-FAIL for exceptions.
(In reply to comment #3)
> We're also printing TEST-FAIL instead of TEST-UNEXPECTED-FAIL for exceptions.

Bug 468476 fixed that.
URL: 468476
URL: 468476
Depends on: 468476
(In reply to comment #3)
> Actually, it looks like the current code is broken

What is the (reftest part of) buildbot-configs/*/mozbuild.py used for ?
(compared to buildbotcustom/*/steps.py)
Per comment 1 and comment 3.

I did some manual unit-testing of the new code,
but I can't test the file as a whole.

Asking r? from David for the "layout" p-o-v and Ted for the "BuildBot" p-o-v.
Assignee: nobody → sgautherie.bz
Status: NEW → ASSIGNED
Attachment #352027 - Flags: review?(ted.mielczarek)
Attachment #352027 - Flags: review?(dbaron)
Why not do the summing in reftest.js?  (Maybe call them successful, unexpected, and known problems?)  That would also give people the ability to compare their results to the tinderbox results.
reftest.js could perhaps print a summary like this (instead of what it's printing now):

REFTEST INFO | 1500 successful (1400 pass, 100 load only)
REFTEST INFO | 0 unexpected (0 unexpected fail, 0 unexpected pass,
REFTEST INFO |               0 failed load, 0 exceptions)
REFTEST INFO | 160 known problems (100 known fail, 50 random, 10 skipped)

(maybe merging the 2nd and 3rd lines)
Attached patch (Bv1) New summarized log format (obsolete) — Splinter Review
[Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.2a1pre) Gecko/20081209 Minefield/3.2a1pre] (home, optim default) (W2Ksp4)
(http://hg.mozilla.org/mozilla-central/rev/85507cfcdda8)

Per comment 8:
{
REFTEST INFO | Result summary:
REFTEST INFO | Successful: 2410 (2406 pass, 4 load only)
REFTEST INFO | Unexpected: 1 (0 unexpected fail, 1 unexpected pass, 0 failed load, 0 exception)
REFTEST INFO | Known problems: 111 (92 known fail, 13 random, 6 skipped)
}
Attachment #352103 - Flags: review?(dbaron)
Per comment 7.

I did some manual unit-testing of the new code,
but I can't test the file as a whole.

Asking r? from David for the "layout" p-o-v and Ted for the "BuildBot" p-o-v.
Attachment #352027 - Attachment is obsolete: true
Attachment #352106 - Flags: review?(ted.mielczarek)
Attachment #352106 - Flags: review?(dbaron)
Attachment #352027 - Flags: review?(ted.mielczarek)
Attachment #352027 - Flags: review?(dbaron)
Comment on attachment 352103 [details] [diff] [review]
(Bv1) New summarized log format

I don't think we want the count != 0 checks around the groups.  We might want them around the items, but I'd say just take them out completely.
Attachment #352103 - Flags: review?(dbaron) → review-
Comment on attachment 352106 [details] [diff] [review]
(Av2) Use the new summarized 'REFTEST INFO' counts

Might it be faster if the regular expression checked that REFTEST was at the beginning of the line?

Also, given the other changes, it seems like you should require all 3 INFO lines to be present, and give an error if they aren't.

Other than that, this looks fine to me, although I'm really not an official reviewer for this code.
Attachment #352106 - Flags: review?(dbaron) → review-
Component: Testing → Reftest
Product: Core → Testing
Version: Trunk → unspecified
Attached patch (Bv2) New summarized log format (obsolete) — Splinter Review
Bv1, with comment 11 suggestion(s).
Attachment #352103 - Attachment is obsolete: true
Attachment #352465 - Flags: review?(dbaron)
Comment on attachment 352465 [details] [diff] [review]
(Bv2) New summarized log format

r=dbaron

Could you also fix or remove the comments in the initialization of gTestResults, since they currently categorize them incorrectly?
Bv2, with comment 14 suggestion(s).
Attachment #352465 - Attachment is obsolete: true
Comment on attachment 352630 [details] [diff] [review]
(Bv2a) New summarized log format
[Checkin: Comment 16 & 17]

http://hg.mozilla.org/mozilla-central/rev/dbe0fab46324
Attachment #352630 - Attachment description: (Bv2a) New summarized log format → (Bv2a) New summarized log format [Checkin: Comment 16]
Keywords: checkin-needed
Whiteboard: [c-n: Bv2a baking for 1.9.1]
Comment on attachment 352630 [details] [diff] [review]
(Bv2a) New summarized log format
[Checkin: Comment 16 & 17]

http://hg.mozilla.org/releases/mozilla-1.9.1/rev/f7cb13b74007
Attachment #352630 - Attachment description: (Bv2a) New summarized log format [Checkin: Comment 16] → (Bv2a) New summarized log format [Checkin: Comment 16 & 17]
Keywords: checkin-needed
Whiteboard: [c-n: Bv2a baking for 1.9.1] → [Bv2a is fixed1.9.1]
Av2, with comment 12 suggestion(s).

(In reply to comment #12)
> Might it be faster if the regular expression checked that REFTEST was at the
> beginning of the line?

This is what .match() does, != .search() ;-)

> Also, given the other changes, it seems like you should require all 3 INFO
> lines to be present, and give an error if they aren't.

I ported the code used elsewhere in this file.
(I'm planning a followup bug to sync all these...)

Asking r? from David for the "layout" p-o-v and Ben for the "BuildBot" p-o-v.
Attachment #352106 - Attachment is obsolete: true
Attachment #352755 - Flags: review?(dbaron)
Attachment #352755 - Flags: review?(bhearsum)
Attachment #352106 - Flags: review?(ted.mielczarek)
(In reply to comment #18)
> (In reply to comment #12)
> > Might it be faster if the regular expression checked that REFTEST was at the
> > beginning of the line?
> 
> This is what .match() does, != .search() ;-)

How does that expression match, then, given that it doesn't have a .* at the end to match the end of the line?
Comment on attachment 352755 [details] [diff] [review]
(Av3) Use the new summarized 'REFTEST INFO' counts

Er, never mind my previous question... although I don't see why that makes sense as a primitive operation called "match".

This looks fine to me, although bhearsum should be the main reviewer (or designate someone else to be).
Attachment #352755 - Flags: review?(dbaron) → review+
One other thought is that in addition to printing "FAIL" it might be good to make the test actually fail.
Av3, with comment 21 suggestion(s).

*Adds |emphasizeFailureText()| and uses it everywhere.
*Rewrites |summaryText()|:
 *Removes superfluous |knownFail|.
 *Separates |emphasizeFailureText()| calls for counts and leak.
*Fixes nits in Av3 patch.

***

(In reply to comment #21)
> One other thought is that in addition to printing "FAIL" it might be good to
> make the test actually fail.

Did I understood you correctly or were you thinking about something else/more ?
Attachment #352755 - Attachment is obsolete: true
Attachment #352819 - Flags: review?(bhearsum)
Attachment #352755 - Flags: review?(bhearsum)
I was saying that if the reftest output can't be parsed, tinderbox ought to turn orange, not just print "FAIL" in the build's table cell.
(In reply to comment #23)
> I was saying that if the reftest output can't be parsed, tinderbox ought to
> turn orange, not just print "FAIL" in the build's table cell.

That's what I thought (and I concur), but I'm not sure how to achieve this:
I would guess |evaluateCommand()| needs to be modified ? (How ?)
Av3a, with nits fixed.
Attachment #352819 - Attachment is obsolete: true
Attachment #352915 - Flags: review?(bhearsum)
Attachment #352819 - Flags: review?(bhearsum)
QA Contact: testing → reftest
Ben, ping ?
Depends on: 468577
Attachment #352915 - Flags: review?(ccooper)
Attachment #352915 - Flags: review?(bhearsum) → review+
Comment on attachment 352915 [details] [diff] [review]
(Av3b) Use the new summarized 'REFTEST INFO' counts, ++

Looks fine to me.
Attachment #352915 - Flags: review?(ccooper)
Target Milestone: --- → mozilla1.9.1b3
Version: unspecified → Trunk
Whiteboard: [Bv2a is fixed1.9.1] → [c-n: Av3b to cvs/1.9.0] [Bv2a is fixed1.9.1]
Av3b, with a nit fix.

Ben, could you check this in ?
Attachment #352915 - Attachment is obsolete: true
Whiteboard: [c-n: Av3b to cvs/1.9.0] [Bv2a is fixed1.9.1] → [c-n: Av3c to cvs/1.9.0] [Bv2a is fixed1.9.1]
Unfortunately due to a bug with our l10n builds I can't do enable this right away. I'll make sure this gets in the next downtime though.
Comment on attachment 355929 [details] [diff] [review]
(Av3c) Use the new summarized 'REFTEST INFO' counts, ++
[Checkin: Comment 30]

Carried forward review and fixed some bitrot.

Checking in steps.py;
/cvsroot/mozilla/tools/buildbotcustom/unittest/steps.py,v  <--  steps.py
new revision: 1.20; previous revision: 1.19
done
Attachment #355929 - Flags: review+
Comment on attachment 355929 [details] [diff] [review]
(Av3c) Use the new summarized 'REFTEST INFO' counts, ++
[Checkin: Comment 30]

These changes will be pushed to the master tomorrow during the downtime.
Attachment #355929 - Attachment description: (Av3c) Use the new summarized 'REFTEST INFO' counts, ++ → (Av3c) Use the new summarized 'REFTEST INFO' counts, ++ [Checkin: Comment 30]
Depends on: 472861
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Whiteboard: [c-n: Av3c to cvs/1.9.0] [Bv2a is fixed1.9.1] → [Bv2a is fixed1.9.1]
Blocks: 473259
(In reply to comment #5)
> What is the (reftest part of) buildbot-configs/*/mozbuild.py used for ?
> (compared to buildbotcustom/*/steps.py)

Ben, would you know the answer to this question ?

(In reply to comment #18)
> (I'm planning a followup bug to sync all these...)

I filed bug 473259.

(In reply to comment #24)
> (In reply to comment #23)
> > I was saying that if the reftest output can't be parsed, tinderbox ought to
> > turn orange, not just print "FAIL" in the build's table cell.
> 
> That's what I thought (and I concur), but I'm not sure how to achieve this:
> I would guess |evaluateCommand()| needs to be modified ? (How ?)

David, I still don't know more than what I wrote (comment and patch), then feel
free to file a bug about this...
(syntax) Not yet tested, but asking for (early) review.

David, is that what you meant in comment 21 ?
Attachment #356650 - Flags: review?(dbaron)
Attachment #356650 - Flags: review?(ccooper)
Comment on attachment 356650 [details] [diff] [review]
(Cv1) Update evaluateCommand() too

This seems like a good change to make, although it's not what I was talking about in comment 21, and I'm not an appropriate reviewer for this code.

(I was referring to the < 0 checks, I think.)
(In reply to comment #32)
> (In reply to comment #5)
> > What is the (reftest part of) buildbot-configs/*/mozbuild.py used for ?
> > (compared to buildbotcustom/*/steps.py)
> 
> Ben, would you know the answer to this question ?
> 

Are you asking about CVS or Mercurial?

I think you're asking about CVS, in which case...only mozilla/tools/buildbot-configs/testing/unittest and mozilla/tools/buildbot-configs/testing/unittest-stage are relevant. leaktest/ and moz2unit/ are not used anymore.
Cv1, fixed and tested.

match() doesn't care for re.MULTILINE mode,
per http://docs.python.org/library/re.html#matching-searching
Attachment #356650 - Attachment is obsolete: true
Attachment #356710 - Flags: review?(ccooper)
Attachment #356650 - Flags: review?(ccooper)
(In reply to comment #34)
> (I was referring to the < 0 checks, I think.)

Then, adding emphasizeFailureText() in comment 22 was what you wanted, wasn't it ?

(In reply to comment #35)
> (In reply to comment #32)
> > (In reply to comment #5)
> > > What is the (reftest part of) buildbot-configs/*/mozbuild.py used for ?
> > > (compared to buildbotcustom/*/steps.py)
> > 
> > Ben, would you know the answer to this question ?
> 
> Are you asking about CVS or Mercurial?

Cvs, yes; are these files in Hg ?
http://mxr.mozilla.org/mozilla/find?string=%2Funittest.*%2Fmozbuild.py$

These files seem to be "near" copies of steps.py,
so I wonder what they are used for and if they should be kept in sync ?
Attachment #356710 - Flags: review?(ccooper) → review+
Comment on attachment 356710 [details] [diff] [review]
(Cv1a) Update evaluateCommand() too
[Checkin: Comment 38]

Checking in steps.py;
/cvsroot/mozilla/tools/buildbotcustom/unittest/steps.py,v  <--  steps.py
new revision: 1.21; previous revision: 1.20
done
Attachment #356710 - Attachment description: (Cv1a) Update evaluateCommand() too → (Cv1a) Update evaluateCommand() too [Checkin: Comment 38]
(In reply to comment #37)
> (In reply to comment #34)
> > (I was referring to the < 0 checks, I think.)
> 
> Then, adding emphasizeFailureText() in comment 22 was what you wanted, wasn't
> it ?

No, I meant making the tinderbox turn orange.
(In reply to comment #39)
> No, I meant making the tinderbox turn orange.

David, *what* (would) makes the tinderbox turn orange ?
(if not comment 36 evaluateCommand() either)
The case where it prints "FAIL" because it can't parse the log.
David, I'm sorry but your answers don't seem to be enough for me to understand what should be done (exactly):
could you rather point me to some code (examples) of what/where ?
In that patch you had the code:

+        if successfulCount < 0 or unexpectedCount < 0 or knownProblemsCount < 0:
+            summary += "FAIL\n"

I'm saying that this should also have code to make the tinderbox turn orange.
(In reply to comment #43)

Oh, let's see:

> In that patch you had the code:

your comment 21 (correctly) referred to Av3 patch.

> +        if successfulCount < 0 or unexpectedCount < 0 or knownProblemsCount <
> 0:
> +            summary += "FAIL\n"

Then, in comment 22 Av3a patch, I enhanced it to
+            summary += '%s\n' % emphasizeFailureText('FAIL')
and that's what (in Av3c patch) got checked in.

> I'm saying that this should also have code to make the tinderbox turn orange.

Then, since your comment 23 answer, I'm under the impression that you think this didn't solve your request.
Did we simply misunderstood each other ?
(Or do you think there actually is something more to do yet ?)
(In reply to comment #44)
> Then, in comment 22 Av3a patch, I enhanced it to
> +            summary += '%s\n' % emphasizeFailureText('FAIL')
> and that's what (in Av3c patch) got checked in.

emphasizeFailureText puts <em class="testfail">...</em> around the text.

> > I'm saying that this should also have code to make the tinderbox turn orange.
> 
> Then, since your comment 23 answer, I'm under the impression that you think
> this didn't solve your request.
> Did we simply misunderstood each other ?
> (Or do you think there actually is something more to do yet ?)

Are you saying that there's code somewhere else in buildbot that magically turns the build orange if there's text with '<em class="testfail">' in it?

If not, there's more to do.
Blocks: 478165
Whiteboard: [Bv2a is fixed1.9.1] → [fixed1.9.1b3: Bv2a]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: