Last Comment Bug 629016 - Permaorange reftests on OS X 64
: Permaorange reftests on OS X 64
Status: RESOLVED FIXED
[hardblocker][has patch][blocked on o...
: regression
Product: Core
Classification: Components
Component: Graphics (show other bugs)
: Trunk
: x86 Mac OS X
: -- normal (vote)
: ---
Assigned To: Joe Drew (not getting mail)
:
: Milan Sreckovic [:milan]
Mentors:
Depends on: 629066 629509
Blocks: 621778
  Show dependency treegraph
 
Reported: 2011-01-26 09:00 PST by Joe Drew (not getting mail)
Modified: 2013-12-27 14:27 PST (History)
13 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
betaN+


Attachments
only use OpenGL acceleration on 10.6.3 and above (1.44 KB, patch)
2011-01-27 18:54 PST, Joe Drew (not getting mail)
jmuizelaar: review+
jaas: review+
Details | Diff | Splinter Review
v2: only apply our blocking logic on 10.6 (1.50 KB, patch)
2011-01-28 13:50 PST, Joe Drew (not getting mail)
jmuizelaar: review+
Details | Diff | Splinter Review
Re-enable tests disabled earlier (5.54 KB, patch)
2011-02-03 13:16 PST, Joe Drew (not getting mail)
jmuizelaar: review+
Details | Diff | Splinter Review

Description Joe Drew (not getting mail) 2011-01-26 09:00:43 PST
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/bugs/502288-1.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/bugs/513153-1a.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/bugs/513153-1b.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/bugs/526463-1.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/font-face/ex-unit-1-dynamic.html | image comparison (==)
Comment 1 Jeff Muizelaar [:jrmuizel] 2011-01-26 09:09:40 PST
These look like an invalidation problem.
Comment 2 Joe Drew (not getting mail) 2011-01-26 10:38:22 PST
I pushed http://hg.mozilla.org/mozilla-central/rev/c06b04702cf8 to disable the reftests until we fix them.
Comment 3 Armen Zambrano [:armenzg] (EDT/UTC-4) 2011-01-26 11:59:41 PST
Unfortunately one of the two snow leopard staging slaves I was using was updating to the wrong puppet server (it's supposed to talk with the puppet staging server rather than production).
I should have been able to catch this by mistrusting our staging systems.
joe I am very sorry.

Here is the run of the staging slave that actually had the right resolution:
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTest/1296066805.1296068280.30586.gz
and would have caught the perma-oranges.
Comment 4 :Ehsan Akhgari 2011-01-26 12:02:09 PST
I disabled another reftest which failed on OSX64 <http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1296070581.1296071394.14625.gz>.  This was intermittent, but it was very similar to the rest of the failures in this bug.

http://hg.mozilla.org/mozilla-central/rev/7b2654a9c599
Comment 5 :Ehsan Akhgari 2011-01-26 12:03:10 PST
Armen: can you confirm that the test in comment 4 was also run on the same slave?
Comment 6 :Ehsan Akhgari 2011-01-26 12:21:15 PST
These failures kept happening on other tests too, which caused me to close mozilla-central.
Comment 7 Armen Zambrano [:armenzg] (EDT/UTC-4) 2011-01-26 12:51:37 PST
I clarified on IRC but let me rephrase it here.

- a slave on staging was using the wrong screen resolution when we were baking there (it was receiving the screen resolution file from production rather than staging)
- the deployment this morning had no issues. all production slaves updated to the right new screen resolution
- up until now we were *blind* for reftests on 10.6; the new screen size allowed us to run reftests with hardware acceleration (disabling hw acceleration on 10.5 urged us to fix the screen resolution on 10.6)
- since the staging runs were wrong; they failed to catch the perma-oranges that joe disabled on comment 2

Here is my stub at trying to understand what our perma-oranges are.
If we look at [3], those are the last 4 perma-oranges we have revealed.

Before joe's change:
- 10.6 opt reftest - 5626/5/230 [1]
- 10.6 debug reftest - 5622/9/250 [2]

With joe's change:
- 10.6 opt reftest - 5626/0/235
- 10.6 debug reftest - 5622/4/255 [3]

Failures on [3] were already on [2].
Fixed failures on [1] are fixed as well on [3].

We need to disable the perma-oranges for [3]

reftest/tests/content/test/reftest/bug591981-1.html appeared after 8149e1a06476
and is a new perma-orange after that changeset.

###### [1] #######
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/bugs/502288-1.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/bugs/513153-1a.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/bugs/513153-1b.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/bugs/526463-1.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/font-face/ex-unit-1-dynamic.html | image comparison

###### [2] #######
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/bugs/502288-1.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/bugs/513153-1a.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/bugs/513153-1b.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/bugs/526463-1.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/font-face/ex-unit-1-dynamic.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/svg/as-image/img-height-meet-2.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/svg/as-image/img-height-slice-2.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/svg/as-image/img-width-meet-2.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/svg/as-image/img-width-slice-2.html | image comparison (==)

####### [3] #######
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/svg/as-image/img-height-meet-2.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/svg/as-image/img-height-slice-2.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/svg/as-image/img-width-meet-2.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/svg/as-image/img-width-slice-2.html | image comparison (==)

####### [4] #######
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/content/test/reftest/bug591981-1.html | image comparison (==)
Comment 8 Daniel Holbert [:dholbert] 2011-01-26 13:35:34 PST
I filed bug 629143 on investigating the img-* failures.  (I don't know about the other ones.)
Comment 9 Daniel Holbert [:dholbert] 2011-01-26 13:58:37 PST
One more near-permaorange since the resolution change (including the log from Comment 3):
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/content/test/reftest/bug591981-1.html | image comparison (==)
Comment 10 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-01-26 14:31:14 PST
I tried repro'ing the bug591981-1.html failure and svg/as-image/* ones on desktop linux with GL enabled, but was unable.  Might be a cocoa widgetry issue or some characteristic of the test machines.
Comment 11 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-01-26 14:31:41 PST
(In reply to comment #10)
Might be a cocoa widgetry issue or some characteristic of the test machines.
        ^exacerbated by
Comment 12 :Ehsan Akhgari 2011-01-26 14:39:43 PST
One more test disabled:

http://hg.mozilla.org/mozilla-central/rev/b3e0486f4e9b

Also, I made sure that other tests disabled because of this bug are only disabled on OSX64:

http://hg.mozilla.org/mozilla-central/rev/de488f6c12cf
Comment 13 Daniel Holbert [:dholbert] 2011-01-26 15:04:53 PST
(In reply to comment #10)
> I tried repro'ing the bug591981-1.html failure and svg/as-image/* ones on
> desktop linux with GL enabled, but was unable

Same here, on my 64-bit mac mini running OS X 10.6, using one of the failing tinderbox builds (built from 0e7b88440e17).

Not sure why it happens on the tinderboxen macs but not on my mac. :(
Comment 14 Daniel Holbert [:dholbert] 2011-01-26 15:05:32 PST
(GL was enabled by default in the tinderbox build that I tried, btw -- verified in about:support)
Comment 15 Jeff Muizelaar [:jrmuizel] 2011-01-26 15:43:15 PST
I suspect this is caused by:
http://hg.mozilla.org/mozilla-central/rev/31b46f48f714
Comment 17 Joe Drew (not getting mail) 2011-01-26 17:08:54 PST
I have confirmed that this is caused by http://hg.mozilla.org/mozilla-central/rev/31b46f48f714 . Unfortunately, backing out this patch (Bug 621778) is non-trivial, because people built on top of it.

This is a hardblocker because it doesn't just happen on reftests, it happens on real webpages.

This might only happen on 10.6.2; I have not yet been willing to upgrade a computer past that because my own, running 10.6.6, can't reproduce this bug.
Comment 18 :Ehsan Akhgari 2011-01-26 21:21:31 PST
(In reply to comment #17)
> This might only happen on 10.6.2; I have not yet been willing to upgrade a
> computer past that because my own, running 10.6.6, can't reproduce this bug.

I'm on 10.6.5, and I can lend my computer to you for testing tomorrow if you need.
Comment 19 Armen Zambrano [:armenzg] (EDT/UTC-4) 2011-01-27 05:54:03 PST
Is there a value to make a list of which version of 10.6.x it works on?
Or just checking that it works for > 10.6.5 is good enough?
Comment 20 Joe Drew (not getting mail) 2011-01-27 09:36:57 PST
There is value in figuring out what versions this works on, if it is indeed version-specific, because that'll let us more accurately blacklist versions of OS X.
Comment 21 Joe Drew (not getting mail) 2011-01-27 11:47:44 PST
I have confirmed that upgrading a slave to 10.6.6 (from 10.6.2) fixes this bug.
Comment 22 Joe Drew (not getting mail) 2011-01-27 14:15:08 PST
This is not a bug in Firefox. I've verified that all rendering is done properly, but OpenGL puts it in the wrong position. Further, I've verified that updating to 10.6.3 or later fixes this bug.

Unfortunately, this means we need to update all our 10.6 slaves, since we need to blacklist OpenGL-accelerated layers on 10.6.2 and earlier.
Comment 23 Joe Drew (not getting mail) 2011-01-27 18:54:22 PST
Created attachment 507763 [details] [diff] [review]
only use OpenGL acceleration on 10.6.3 and above
Comment 24 Jeff Muizelaar [:jrmuizel] 2011-01-27 21:04:46 PST
Comment on attachment 507763 [details] [diff] [review]
only use OpenGL acceleration on 10.6.3 and above

This will still enable gl in xul accelerated windows. I'm not sure if that's ok or not. Other than that it looks fine.
Comment 25 Bill Gianopoulos [:WG9s] 2011-01-28 07:13:23 PST
Comment on attachment 507763 [details] [diff] [review]
only use OpenGL acceleration on 10.6.3 and above

>+  if (err1 == noErr && err2 == noErr && err3 == noErr &&
>+      major >= 10 && minor >= 6 && bugfix >= 3) {
>+    accelerateByDefault = PR_TRUE;

This if test is not correct.  It will decide that version 11.0.0 is less than 10.6.3.
THis needs to be something more like:
     if (err1 == noErr && err2 == noErr && err3 == noErr &&
        (major > 10 || ((major == 10 && minor > 6) ||
        (major == 10 && minor == 6 && bugfix >=3)))
Comment 26 Bill Gianopoulos [:WG9s] 2011-01-28 07:18:30 PST
I might be simpler to do something like 

      (major * 1000000 + minor *1000 + bugfix) >= 10006003
Comment 27 Jeff Muizelaar [:jrmuizel] 2011-01-28 07:44:39 PST
(In reply to comment #26)
> I might be simpler to do something like 
> 
>       (major * 1000000 + minor *1000 + bugfix) >= 10006003

Indeed. I think this is the best option.
Comment 28 Bill Gianopoulos [:WG9s] 2011-01-28 13:41:27 PST
Hmm. Is the patch just checked in for Bug 628384 going to override this fix?
Comment 29 Joe Drew (not getting mail) 2011-01-28 13:45:21 PST
Yes, but that's ok; our test machines should always be in a working state.
Comment 30 Joe Drew (not getting mail) 2011-01-28 13:50:37 PST
Created attachment 507987 [details] [diff] [review]
v2: only apply our blocking logic on 10.6

Bill's right that the logic before was wrong. I don't see any reason why we should apply our logic to non-10.6 OSes, though, because 10.5 had no problems, and we know nothing about > 10.6. So let's tweak this a little.
Comment 31 Bill Gianopoulos [:WG9s] 2011-01-28 13:54:56 PST
(In reply to comment #29)
> Yes, but that's ok; our test machines should always be in a working state.

I thought the point of this patch was to fix the orange on the test machines by blacklisting them because they were an old OS.
Comment 32 Joe Drew (not getting mail) 2011-01-28 13:59:11 PST
No, we're going to fix our test machines by updating them to 10.6.6 (bug 629509). This is to fix the few users who are still on 10.6.0, 10.6.1, and 10.6.2.
Comment 33 Bill Gianopoulos [:WG9s] 2011-01-28 14:07:54 PST
(In reply to comment #32)
> No, we're going to fix our test machines by updating them to 10.6.6 (bug
> 629509). This is to fix the few users who are still on 10.6.0, 10.6.1, and
> 10.6.2.

OK.  Thanks for the clarification.
Comment 34 Jeff Muizelaar [:jrmuizel] 2011-01-28 14:10:20 PST
Comment on attachment 507987 [details] [diff] [review]
v2: only apply our blocking logic on 10.6

Maybe this should've been done in the blacklisting code. That would make it easier to expose the reason for the block to about:support.
Comment 35 Joe Drew (not getting mail) 2011-02-03 13:16:54 PST
Created attachment 509545 [details] [diff] [review]
Re-enable tests disabled earlier

Now that we have 10.6.2 and below blocked for OpenGL, we can re-enable the tests we disabled because the OS X bug caused them to fail.
Comment 37 pal-moz 2011-02-08 14:38:49 PST
is this included not in beta 11, but in beta 12 ?
Comment 38 Daniel Holbert [:dholbert] 2011-02-08 14:51:58 PST
I believe you're correct -- at least at one point, the beta11 was tagged at
  http://hg.mozilla.org/mozilla-central/rev/49c3ca45a79f
which is older than the csets in Comment 36 (based on |hg log|).

Note You need to log in before you can comment on or make changes to this bug.