Synchronize RefTest results to tinderbox waterfall from (new) log

RESOLVED FIXED in mozilla1.9.1b3

Status

RESOLVED FIXED
10 years ago
9 years ago

People

(Reporter: sgautherie, Assigned: sgautherie)

Tracking

({fixed1.9.1})

Trunk
mozilla1.9.1b3
fixed1.9.1
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [fixed1.9.1b3: Bv2a], URL)

Attachments

(3 attachments, 8 obsolete attachments)

2.59 KB, patch
Details | Diff | Splinter Review
8.62 KB, patch
coop
: review+
Details | Diff | Splinter Review
1.61 KB, patch
coop
: review+
Details | Diff | Splinter Review
(Assignee)

Description

10 years ago
Bug 466372 comment 1:
{{
From  Serge Gautherie   2008-11-23 08:19:53 PST

Fwiw, the figures are a little different from the (Windows SeaMonkey)
waterfall:
{
TinderboxPrint: reftest<br/>2326/0/93
}
I would guess
2326 = Pass + Fail (which don't on the tindrboxes) + LoadOnly
93 = +/- "Known Fail" (without the "Random only") + Skip
}}

***

Example of current outputs:

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1228438741.1228442157.13690.gz&fulltext=1
Linux mozilla-central moz2-linux-slave07 dep unit test on 2008/12/04 16:59:01
REFTEST INFO | Pass: 2351
REFTEST INFO | KnownFail: 98
REFTEST INFO | Random: 30
REFTEST INFO | LoadOnly: 4
REFTEST INFO | Skip: 15
TinderboxPrint: reftest<br/>2362/0/113
REFTEST INFO | LoadOnly: 795
REFTEST INFO | Skip: 2
TinderboxPrint: crashtest<br/>799/0/2

***

I don't know where/how the (3) TinderboxPrint numbers are calculated...

But I'd like to suggest to calculate them as the sum of the new items:
{
       8 +  // The errors...
       9 +  Exception: 0,
      10 +  FailedLoad: 0,
      11 +  UnexpectedFail: 0,
      12 +  UnexpectedPass: 0,

      13 +  // The successes...
      14 +  Pass: 0,
      15 +  KnownFail : 0,
      16 +  Random : 0,
      17 +  LoadOnly: 0,

      18 +  // The unknowns.
      19 +  Skip: 0,
}
We should not do it this way.

The first number (passed tests) should be the sum of the Pass and the LoadOnly.  The second number (unexpected results that turn the tree orange) should be the sum of the Exception (maybe, I'm not even sure what this is), FailedLoad, UnexpectedFail, and UnexpectedPass.  The third number (known failures) should be the some of the KnownFail, Skip, and Random.

However, I don't think there's any need to rewrite the code in question, nor do I see how such a rewrite would change the results (except possibly for the Exception category).
(Assignee)

Comment 2

10 years ago
(In reply to comment #1)
> We should not do it this way.

Fine with me.

> However, I don't think there's any need to rewrite the code in question, nor do
> I see how such a rewrite would change the results

At least, I'm puzzled by the 4 extra crashtests reported on the waterfall :-/
Actually, it looks like the current code is broken, so maybe we should do this.  In particular, the current code does this:

REFTEST TEST-FAIL | | EXCEPTION: Error in manifest file file:///builds/slave/trunk_linux-7/build/layout/reftests/text/reftest.list line 28
REFTEST FINISHED: Slowest test took 0ms (undefined)
REFTEST INFO | Result summary:
REFTEST INFO | Exception: 1
REFTEST INFO | Total canvas count = 0
program finished with exit code 0
TinderboxPrint: reftest<br/>5/0/0

Note that it's counting the info lines as passing tests.  We're also printing TEST-FAIL instead of TEST-UNEXPECTED-FAIL for exceptions.
(Assignee)

Comment 4

10 years ago
(In reply to comment #3)
> We're also printing TEST-FAIL instead of TEST-UNEXPECTED-FAIL for exceptions.

Bug 468476 fixed that.
URL: 468476
(Assignee)

Updated

10 years ago
URL: 468476
Depends on: 468476
(Assignee)

Comment 5

10 years ago
(In reply to comment #3)
> Actually, it looks like the current code is broken

What is the (reftest part of) buildbot-configs/*/mozbuild.py used for ?
(compared to buildbotcustom/*/steps.py)
(Assignee)

Comment 6

10 years ago
Created attachment 352027 [details] [diff] [review]
(Av1) Sum the 'REFTEST INFO' counts

Per comment 1 and comment 3.

I did some manual unit-testing of the new code,
but I can't test the file as a whole.

Asking r? from David for the "layout" p-o-v and Ted for the "BuildBot" p-o-v.
Assignee: nobody → sgautherie.bz
Status: NEW → ASSIGNED
Attachment #352027 - Flags: review?(ted.mielczarek)
Attachment #352027 - Flags: review?(dbaron)
Why not do the summing in reftest.js?  (Maybe call them successful, unexpected, and known problems?)  That would also give people the ability to compare their results to the tinderbox results.
reftest.js could perhaps print a summary like this (instead of what it's printing now):

REFTEST INFO | 1500 successful (1400 pass, 100 load only)
REFTEST INFO | 0 unexpected (0 unexpected fail, 0 unexpected pass,
REFTEST INFO |               0 failed load, 0 exceptions)
REFTEST INFO | 160 known problems (100 known fail, 50 random, 10 skipped)

(maybe merging the 2nd and 3rd lines)
(Assignee)

Comment 9

10 years ago
Created attachment 352103 [details] [diff] [review]
(Bv1) New summarized log format

[Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.2a1pre) Gecko/20081209 Minefield/3.2a1pre] (home, optim default) (W2Ksp4)
(http://hg.mozilla.org/mozilla-central/rev/85507cfcdda8)

Per comment 8:
{
REFTEST INFO | Result summary:
REFTEST INFO | Successful: 2410 (2406 pass, 4 load only)
REFTEST INFO | Unexpected: 1 (0 unexpected fail, 1 unexpected pass, 0 failed load, 0 exception)
REFTEST INFO | Known problems: 111 (92 known fail, 13 random, 6 skipped)
}
Attachment #352103 - Flags: review?(dbaron)
(Assignee)

Comment 10

10 years ago
Created attachment 352106 [details] [diff] [review]
(Av2) Use the new summarized 'REFTEST INFO' counts

Per comment 7.

I did some manual unit-testing of the new code,
but I can't test the file as a whole.

Asking r? from David for the "layout" p-o-v and Ted for the "BuildBot" p-o-v.
Attachment #352027 - Attachment is obsolete: true
Attachment #352106 - Flags: review?(ted.mielczarek)
Attachment #352106 - Flags: review?(dbaron)
Attachment #352027 - Flags: review?(ted.mielczarek)
Attachment #352027 - Flags: review?(dbaron)
Comment on attachment 352103 [details] [diff] [review]
(Bv1) New summarized log format

I don't think we want the count != 0 checks around the groups.  We might want them around the items, but I'd say just take them out completely.
Attachment #352103 - Flags: review?(dbaron) → review-
Comment on attachment 352106 [details] [diff] [review]
(Av2) Use the new summarized 'REFTEST INFO' counts

Might it be faster if the regular expression checked that REFTEST was at the beginning of the line?

Also, given the other changes, it seems like you should require all 3 INFO lines to be present, and give an error if they aren't.

Other than that, this looks fine to me, although I'm really not an official reviewer for this code.
Attachment #352106 - Flags: review?(dbaron) → review-
Component: Testing → Reftest
Product: Core → Testing
Version: Trunk → unspecified
(Assignee)

Comment 13

10 years ago
Created attachment 352465 [details] [diff] [review]
(Bv2) New summarized log format

Bv1, with comment 11 suggestion(s).
Attachment #352103 - Attachment is obsolete: true
Attachment #352465 - Flags: review?(dbaron)
Attachment #352465 - Flags: review?(dbaron) → review+
Comment on attachment 352465 [details] [diff] [review]
(Bv2) New summarized log format

r=dbaron

Could you also fix or remove the comments in the initialization of gTestResults, since they currently categorize them incorrectly?
(Assignee)

Comment 15

10 years ago
Created attachment 352630 [details] [diff] [review]
(Bv2a) New summarized log format
[Checkin: Comment 16 & 17]

Bv2, with comment 14 suggestion(s).
Attachment #352465 - Attachment is obsolete: true
(Assignee)

Comment 16

10 years ago
Comment on attachment 352630 [details] [diff] [review]
(Bv2a) New summarized log format
[Checkin: Comment 16 & 17]

http://hg.mozilla.org/mozilla-central/rev/dbe0fab46324
Attachment #352630 - Attachment description: (Bv2a) New summarized log format → (Bv2a) New summarized log format [Checkin: Comment 16]
(Assignee)

Updated

10 years ago
Keywords: checkin-needed
Whiteboard: [c-n: Bv2a baking for 1.9.1]
(Assignee)

Comment 17

10 years ago
Comment on attachment 352630 [details] [diff] [review]
(Bv2a) New summarized log format
[Checkin: Comment 16 & 17]

http://hg.mozilla.org/releases/mozilla-1.9.1/rev/f7cb13b74007
Attachment #352630 - Attachment description: (Bv2a) New summarized log format [Checkin: Comment 16] → (Bv2a) New summarized log format [Checkin: Comment 16 & 17]
(Assignee)

Updated

10 years ago
Keywords: checkin-needed
Whiteboard: [c-n: Bv2a baking for 1.9.1] → [Bv2a is fixed1.9.1]
(Assignee)

Comment 18

10 years ago
Created attachment 352755 [details] [diff] [review]
(Av3) Use the new summarized 'REFTEST INFO' counts

Av2, with comment 12 suggestion(s).

(In reply to comment #12)
> Might it be faster if the regular expression checked that REFTEST was at the
> beginning of the line?

This is what .match() does, != .search() ;-)

> Also, given the other changes, it seems like you should require all 3 INFO
> lines to be present, and give an error if they aren't.

I ported the code used elsewhere in this file.
(I'm planning a followup bug to sync all these...)

Asking r? from David for the "layout" p-o-v and Ben for the "BuildBot" p-o-v.
Attachment #352106 - Attachment is obsolete: true
Attachment #352755 - Flags: review?(dbaron)
Attachment #352755 - Flags: review?(bhearsum)
Attachment #352106 - Flags: review?(ted.mielczarek)
(In reply to comment #18)
> (In reply to comment #12)
> > Might it be faster if the regular expression checked that REFTEST was at the
> > beginning of the line?
> 
> This is what .match() does, != .search() ;-)

How does that expression match, then, given that it doesn't have a .* at the end to match the end of the line?
Comment on attachment 352755 [details] [diff] [review]
(Av3) Use the new summarized 'REFTEST INFO' counts

Er, never mind my previous question... although I don't see why that makes sense as a primitive operation called "match".

This looks fine to me, although bhearsum should be the main reviewer (or designate someone else to be).
Attachment #352755 - Flags: review?(dbaron) → review+
One other thought is that in addition to printing "FAIL" it might be good to make the test actually fail.
(Assignee)

Comment 22

10 years ago
Created attachment 352819 [details] [diff] [review]
(Av3a) Use the new summarized 'REFTEST INFO' counts, ++

Av3, with comment 21 suggestion(s).

*Adds |emphasizeFailureText()| and uses it everywhere.
*Rewrites |summaryText()|:
 *Removes superfluous |knownFail|.
 *Separates |emphasizeFailureText()| calls for counts and leak.
*Fixes nits in Av3 patch.

***

(In reply to comment #21)
> One other thought is that in addition to printing "FAIL" it might be good to
> make the test actually fail.

Did I understood you correctly or were you thinking about something else/more ?
Attachment #352755 - Attachment is obsolete: true
Attachment #352819 - Flags: review?(bhearsum)
Attachment #352755 - Flags: review?(bhearsum)
I was saying that if the reftest output can't be parsed, tinderbox ought to turn orange, not just print "FAIL" in the build's table cell.
(Assignee)

Comment 24

10 years ago
(In reply to comment #23)
> I was saying that if the reftest output can't be parsed, tinderbox ought to
> turn orange, not just print "FAIL" in the build's table cell.

That's what I thought (and I concur), but I'm not sure how to achieve this:
I would guess |evaluateCommand()| needs to be modified ? (How ?)
(Assignee)

Comment 25

10 years ago
Created attachment 352915 [details] [diff] [review]
(Av3b) Use the new summarized 'REFTEST INFO' counts, ++

Av3a, with nits fixed.
Attachment #352819 - Attachment is obsolete: true
Attachment #352915 - Flags: review?(bhearsum)
Attachment #352819 - Flags: review?(bhearsum)
QA Contact: testing → reftest
(Assignee)

Comment 26

10 years ago
Ben, ping ?
Depends on: 468577
(Assignee)

Updated

10 years ago
Attachment #352915 - Flags: review?(ccooper)

Updated

10 years ago
Attachment #352915 - Flags: review?(bhearsum) → review+
Comment on attachment 352915 [details] [diff] [review]
(Av3b) Use the new summarized 'REFTEST INFO' counts, ++

Looks fine to me.
(Assignee)

Updated

10 years ago
Attachment #352915 - Flags: review?(ccooper)
(Assignee)

Updated

10 years ago
Keywords: checkin-needed, fixed1.9.1
Target Milestone: --- → mozilla1.9.1b3
Version: unspecified → Trunk
(Assignee)

Updated

10 years ago
Whiteboard: [Bv2a is fixed1.9.1] → [c-n: Av3b to cvs/1.9.0] [Bv2a is fixed1.9.1]
(Assignee)

Comment 28

10 years ago
Created attachment 355929 [details] [diff] [review]
(Av3c) Use the new summarized 'REFTEST INFO' counts, ++
[Checkin: Comment 30]

Av3b, with a nit fix.

Ben, could you check this in ?
Attachment #352915 - Attachment is obsolete: true
(Assignee)

Updated

10 years ago
Whiteboard: [c-n: Av3b to cvs/1.9.0] [Bv2a is fixed1.9.1] → [c-n: Av3c to cvs/1.9.0] [Bv2a is fixed1.9.1]
Unfortunately due to a bug with our l10n builds I can't do enable this right away. I'll make sure this gets in the next downtime though.
Comment on attachment 355929 [details] [diff] [review]
(Av3c) Use the new summarized 'REFTEST INFO' counts, ++
[Checkin: Comment 30]

Carried forward review and fixed some bitrot.

Checking in steps.py;
/cvsroot/mozilla/tools/buildbotcustom/unittest/steps.py,v  <--  steps.py
new revision: 1.20; previous revision: 1.19
done
Attachment #355929 - Flags: review+
Comment on attachment 355929 [details] [diff] [review]
(Av3c) Use the new summarized 'REFTEST INFO' counts, ++
[Checkin: Comment 30]

These changes will be pushed to the master tomorrow during the downtime.
(Assignee)

Updated

10 years ago
Attachment #355929 - Attachment description: (Av3c) Use the new summarized 'REFTEST INFO' counts, ++ → (Av3c) Use the new summarized 'REFTEST INFO' counts, ++ [Checkin: Comment 30]
(Assignee)

Updated

10 years ago
Depends on: 472861
(Assignee)

Updated

10 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 10 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Whiteboard: [c-n: Av3c to cvs/1.9.0] [Bv2a is fixed1.9.1] → [Bv2a is fixed1.9.1]
(Assignee)

Updated

10 years ago
Blocks: 473259
(Assignee)

Comment 32

10 years ago
(In reply to comment #5)
> What is the (reftest part of) buildbot-configs/*/mozbuild.py used for ?
> (compared to buildbotcustom/*/steps.py)

Ben, would you know the answer to this question ?

(In reply to comment #18)
> (I'm planning a followup bug to sync all these...)

I filed bug 473259.

(In reply to comment #24)
> (In reply to comment #23)
> > I was saying that if the reftest output can't be parsed, tinderbox ought to
> > turn orange, not just print "FAIL" in the build's table cell.
> 
> That's what I thought (and I concur), but I'm not sure how to achieve this:
> I would guess |evaluateCommand()| needs to be modified ? (How ?)

David, I still don't know more than what I wrote (comment and patch), then feel
free to file a bug about this...
(Assignee)

Comment 33

10 years ago
Created attachment 356650 [details] [diff] [review]
(Cv1) Update evaluateCommand() too

(syntax) Not yet tested, but asking for (early) review.

David, is that what you meant in comment 21 ?
Attachment #356650 - Flags: review?(dbaron)
Attachment #356650 - Flags: review?(ccooper)
Comment on attachment 356650 [details] [diff] [review]
(Cv1) Update evaluateCommand() too

This seems like a good change to make, although it's not what I was talking about in comment 21, and I'm not an appropriate reviewer for this code.

(I was referring to the < 0 checks, I think.)
(In reply to comment #32)
> (In reply to comment #5)
> > What is the (reftest part of) buildbot-configs/*/mozbuild.py used for ?
> > (compared to buildbotcustom/*/steps.py)
> 
> Ben, would you know the answer to this question ?
> 

Are you asking about CVS or Mercurial?

I think you're asking about CVS, in which case...only mozilla/tools/buildbot-configs/testing/unittest and mozilla/tools/buildbot-configs/testing/unittest-stage are relevant. leaktest/ and moz2unit/ are not used anymore.
(Assignee)

Comment 36

10 years ago
Created attachment 356710 [details] [diff] [review]
(Cv1a) Update evaluateCommand() too
[Checkin: Comment 38]

Cv1, fixed and tested.

match() doesn't care for re.MULTILINE mode,
per http://docs.python.org/library/re.html#matching-searching
Attachment #356650 - Attachment is obsolete: true
Attachment #356710 - Flags: review?(ccooper)
Attachment #356650 - Flags: review?(ccooper)
(Assignee)

Comment 37

10 years ago
(In reply to comment #34)
> (I was referring to the < 0 checks, I think.)

Then, adding emphasizeFailureText() in comment 22 was what you wanted, wasn't it ?

(In reply to comment #35)
> (In reply to comment #32)
> > (In reply to comment #5)
> > > What is the (reftest part of) buildbot-configs/*/mozbuild.py used for ?
> > > (compared to buildbotcustom/*/steps.py)
> > 
> > Ben, would you know the answer to this question ?
> 
> Are you asking about CVS or Mercurial?

Cvs, yes; are these files in Hg ?
http://mxr.mozilla.org/mozilla/find?string=%2Funittest.*%2Fmozbuild.py$

These files seem to be "near" copies of steps.py,
so I wonder what they are used for and if they should be kept in sync ?
Attachment #356710 - Flags: review?(ccooper) → review+
Comment on attachment 356710 [details] [diff] [review]
(Cv1a) Update evaluateCommand() too
[Checkin: Comment 38]

Checking in steps.py;
/cvsroot/mozilla/tools/buildbotcustom/unittest/steps.py,v  <--  steps.py
new revision: 1.21; previous revision: 1.20
done
(Assignee)

Updated

10 years ago
Attachment #356710 - Attachment description: (Cv1a) Update evaluateCommand() too → (Cv1a) Update evaluateCommand() too [Checkin: Comment 38]
(In reply to comment #37)
> (In reply to comment #34)
> > (I was referring to the < 0 checks, I think.)
> 
> Then, adding emphasizeFailureText() in comment 22 was what you wanted, wasn't
> it ?

No, I meant making the tinderbox turn orange.
(Assignee)

Comment 40

10 years ago
(In reply to comment #39)
> No, I meant making the tinderbox turn orange.

David, *what* (would) makes the tinderbox turn orange ?
(if not comment 36 evaluateCommand() either)
The case where it prints "FAIL" because it can't parse the log.
(Assignee)

Comment 42

10 years ago
David, I'm sorry but your answers don't seem to be enough for me to understand what should be done (exactly):
could you rather point me to some code (examples) of what/where ?
In that patch you had the code:

+        if successfulCount < 0 or unexpectedCount < 0 or knownProblemsCount < 0:
+            summary += "FAIL\n"

I'm saying that this should also have code to make the tinderbox turn orange.
(Assignee)

Comment 44

10 years ago
(In reply to comment #43)

Oh, let's see:

> In that patch you had the code:

your comment 21 (correctly) referred to Av3 patch.

> +        if successfulCount < 0 or unexpectedCount < 0 or knownProblemsCount <
> 0:
> +            summary += "FAIL\n"

Then, in comment 22 Av3a patch, I enhanced it to
+            summary += '%s\n' % emphasizeFailureText('FAIL')
and that's what (in Av3c patch) got checked in.

> I'm saying that this should also have code to make the tinderbox turn orange.

Then, since your comment 23 answer, I'm under the impression that you think this didn't solve your request.
Did we simply misunderstood each other ?
(Or do you think there actually is something more to do yet ?)
(In reply to comment #44)
> Then, in comment 22 Av3a patch, I enhanced it to
> +            summary += '%s\n' % emphasizeFailureText('FAIL')
> and that's what (in Av3c patch) got checked in.

emphasizeFailureText puts <em class="testfail">...</em> around the text.

> > I'm saying that this should also have code to make the tinderbox turn orange.
> 
> Then, since your comment 23 answer, I'm under the impression that you think
> this didn't solve your request.
> Did we simply misunderstood each other ?
> (Or do you think there actually is something more to do yet ?)

Are you saying that there's code somewhere else in buildbot that magically turns the build orange if there's text with '<em class="testfail">' in it?

If not, there's more to do.
(Assignee)

Updated

10 years ago
Blocks: 478165
(Assignee)

Updated

9 years ago
Whiteboard: [Bv2a is fixed1.9.1] → [fixed1.9.1b3: Bv2a]
You need to log in before you can comment on or make changes to this bug.