Last Comment Bug 629016 - Permaorange reftests on OS X 64
: Permaorange reftests on OS X 64
Status: RESOLVED FIXED
[hardblocker][has patch][blocked on o...
: regression
Product: Core
Classification: Components
Component: Graphics (show other bugs)
: Trunk
: x86 Mac OS X
-- normal (vote)
: ---
Assigned To: Joe Drew (not getting mail)
:
: Milan Sreckovic [:milan]
Mentors:
Depends on: 629066 629509
Blocks: 621778
  Show dependency treegraph
 
Reported: 2011-01-26 09:00 PST by Joe Drew (not getting mail)
Modified: 2013-12-27 14:27 PST (History)
13 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
betaN+


Attachments
only use OpenGL acceleration on 10.6.3 and above (1.44 KB, patch)
2011-01-27 18:54 PST, Joe Drew (not getting mail)
jmuizelaar: review+
jaas: review+
Details | Diff | Splinter Review
v2: only apply our blocking logic on 10.6 (1.50 KB, patch)
2011-01-28 13:50 PST, Joe Drew (not getting mail)
jmuizelaar: review+
Details | Diff | Splinter Review
Re-enable tests disabled earlier (5.54 KB, patch)
2011-02-03 13:16 PST, Joe Drew (not getting mail)
jmuizelaar: review+
Details | Diff | Splinter Review

Description User image Joe Drew (not getting mail) 2011-01-26 09:00:43 PST
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/bugs/502288-1.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/bugs/513153-1a.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/bugs/513153-1b.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/bugs/526463-1.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/font-face/ex-unit-1-dynamic.html | image comparison (==)
Comment 1 User image Jeff Muizelaar [:jrmuizel] 2011-01-26 09:09:40 PST
These look like an invalidation problem.
Comment 2 User image Joe Drew (not getting mail) 2011-01-26 10:38:22 PST
I pushed http://hg.mozilla.org/mozilla-central/rev/c06b04702cf8 to disable the reftests until we fix them.
Comment 3 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2011-01-26 11:59:41 PST
Unfortunately one of the two snow leopard staging slaves I was using was updating to the wrong puppet server (it's supposed to talk with the puppet staging server rather than production).
I should have been able to catch this by mistrusting our staging systems.
joe I am very sorry.

Here is the run of the staging slave that actually had the right resolution:
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTest/1296066805.1296068280.30586.gz
and would have caught the perma-oranges.
Comment 4 User image :Ehsan Akhgari 2011-01-26 12:02:09 PST
I disabled another reftest which failed on OSX64 <http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1296070581.1296071394.14625.gz>.  This was intermittent, but it was very similar to the rest of the failures in this bug.

http://hg.mozilla.org/mozilla-central/rev/7b2654a9c599
Comment 5 User image :Ehsan Akhgari 2011-01-26 12:03:10 PST
Armen: can you confirm that the test in comment 4 was also run on the same slave?
Comment 6 User image :Ehsan Akhgari 2011-01-26 12:21:15 PST
These failures kept happening on other tests too, which caused me to close mozilla-central.
Comment 7 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2011-01-26 12:51:37 PST
I clarified on IRC but let me rephrase it here.

- a slave on staging was using the wrong screen resolution when we were baking there (it was receiving the screen resolution file from production rather than staging)
- the deployment this morning had no issues. all production slaves updated to the right new screen resolution
- up until now we were *blind* for reftests on 10.6; the new screen size allowed us to run reftests with hardware acceleration (disabling hw acceleration on 10.5 urged us to fix the screen resolution on 10.6)
- since the staging runs were wrong; they failed to catch the perma-oranges that joe disabled on comment 2

Here is my stub at trying to understand what our perma-oranges are.
If we look at [3], those are the last 4 perma-oranges we have revealed.

Before joe's change:
- 10.6 opt reftest - 5626/5/230 [1]
- 10.6 debug reftest - 5622/9/250 [2]

With joe's change:
- 10.6 opt reftest - 5626/0/235
- 10.6 debug reftest - 5622/4/255 [3]

Failures on [3] were already on [2].
Fixed failures on [1] are fixed as well on [3].

We need to disable the perma-oranges for [3]

reftest/tests/content/test/reftest/bug591981-1.html appeared after 8149e1a06476
and is a new perma-orange after that changeset.

###### [1] #######
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/bugs/502288-1.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/bugs/513153-1a.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/bugs/513153-1b.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/bugs/526463-1.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/font-face/ex-unit-1-dynamic.html | image comparison

###### [2] #######
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/bugs/502288-1.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/bugs/513153-1a.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/bugs/513153-1b.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/bugs/526463-1.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/font-face/ex-unit-1-dynamic.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/svg/as-image/img-height-meet-2.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/svg/as-image/img-height-slice-2.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/svg/as-image/img-width-meet-2.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/svg/as-image/img-width-slice-2.html | image comparison (==)

####### [3] #######
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/svg/as-image/img-height-meet-2.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/svg/as-image/img-height-slice-2.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/svg/as-image/img-width-meet-2.html | image comparison (==)
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/svg/as-image/img-width-slice-2.html | image comparison (==)

####### [4] #######
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/content/test/reftest/bug591981-1.html | image comparison (==)
Comment 8 User image Daniel Holbert [:dholbert] (vacation, returning 2/27) 2011-01-26 13:35:34 PST
I filed bug 629143 on investigating the img-* failures.  (I don't know about the other ones.)
Comment 9 User image Daniel Holbert [:dholbert] (vacation, returning 2/27) 2011-01-26 13:58:37 PST
One more near-permaorange since the resolution change (including the log from Comment 3):
REFTEST TEST-UNEXPECTED-FAIL | file:///Users/cltbld/talos-slave/test/build/reftest/tests/content/test/reftest/bug591981-1.html | image comparison (==)
Comment 10 User image Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-01-26 14:31:14 PST
I tried repro'ing the bug591981-1.html failure and svg/as-image/* ones on desktop linux with GL enabled, but was unable.  Might be a cocoa widgetry issue or some characteristic of the test machines.
Comment 11 User image Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-01-26 14:31:41 PST
(In reply to comment #10)
Might be a cocoa widgetry issue or some characteristic of the test machines.
        ^exacerbated by
Comment 12 User image :Ehsan Akhgari 2011-01-26 14:39:43 PST
One more test disabled:

http://hg.mozilla.org/mozilla-central/rev/b3e0486f4e9b

Also, I made sure that other tests disabled because of this bug are only disabled on OSX64:

http://hg.mozilla.org/mozilla-central/rev/de488f6c12cf
Comment 13 User image Daniel Holbert [:dholbert] (vacation, returning 2/27) 2011-01-26 15:04:53 PST
(In reply to comment #10)
> I tried repro'ing the bug591981-1.html failure and svg/as-image/* ones on
> desktop linux with GL enabled, but was unable

Same here, on my 64-bit mac mini running OS X 10.6, using one of the failing tinderbox builds (built from 0e7b88440e17).

Not sure why it happens on the tinderboxen macs but not on my mac. :(
Comment 14 User image Daniel Holbert [:dholbert] (vacation, returning 2/27) 2011-01-26 15:05:32 PST
(GL was enabled by default in the tinderbox build that I tried, btw -- verified in about:support)
Comment 15 User image Jeff Muizelaar [:jrmuizel] 2011-01-26 15:43:15 PST
I suspect this is caused by:
http://hg.mozilla.org/mozilla-central/rev/31b46f48f714
Comment 17 User image Joe Drew (not getting mail) 2011-01-26 17:08:54 PST
I have confirmed that this is caused by http://hg.mozilla.org/mozilla-central/rev/31b46f48f714 . Unfortunately, backing out this patch (Bug 621778) is non-trivial, because people built on top of it.

This is a hardblocker because it doesn't just happen on reftests, it happens on real webpages.

This might only happen on 10.6.2; I have not yet been willing to upgrade a computer past that because my own, running 10.6.6, can't reproduce this bug.
Comment 18 User image :Ehsan Akhgari 2011-01-26 21:21:31 PST
(In reply to comment #17)
> This might only happen on 10.6.2; I have not yet been willing to upgrade a
> computer past that because my own, running 10.6.6, can't reproduce this bug.

I'm on 10.6.5, and I can lend my computer to you for testing tomorrow if you need.
Comment 19 User image Armen Zambrano - Back on March 27th [:armenzg] (EDT/UTC-4) 2011-01-27 05:54:03 PST
Is there a value to make a list of which version of 10.6.x it works on?
Or just checking that it works for > 10.6.5 is good enough?
Comment 20 User image Joe Drew (not getting mail) 2011-01-27 09:36:57 PST
There is value in figuring out what versions this works on, if it is indeed version-specific, because that'll let us more accurately blacklist versions of OS X.
Comment 21 User image Joe Drew (not getting mail) 2011-01-27 11:47:44 PST
I have confirmed that upgrading a slave to 10.6.6 (from 10.6.2) fixes this bug.
Comment 22 User image Joe Drew (not getting mail) 2011-01-27 14:15:08 PST
This is not a bug in Firefox. I've verified that all rendering is done properly, but OpenGL puts it in the wrong position. Further, I've verified that updating to 10.6.3 or later fixes this bug.

Unfortunately, this means we need to update all our 10.6 slaves, since we need to blacklist OpenGL-accelerated layers on 10.6.2 and earlier.
Comment 23 User image Joe Drew (not getting mail) 2011-01-27 18:54:22 PST
Created attachment 507763 [details] [diff] [review]
only use OpenGL acceleration on 10.6.3 and above
Comment 24 User image Jeff Muizelaar [:jrmuizel] 2011-01-27 21:04:46 PST
Comment on attachment 507763 [details] [diff] [review]
only use OpenGL acceleration on 10.6.3 and above

This will still enable gl in xul accelerated windows. I'm not sure if that's ok or not. Other than that it looks fine.
Comment 25 User image Bill Gianopoulos [:WG9s] 2011-01-28 07:13:23 PST
Comment on attachment 507763 [details] [diff] [review]
only use OpenGL acceleration on 10.6.3 and above

>+  if (err1 == noErr && err2 == noErr && err3 == noErr &&
>+      major >= 10 && minor >= 6 && bugfix >= 3) {
>+    accelerateByDefault = PR_TRUE;

This if test is not correct.  It will decide that version 11.0.0 is less than 10.6.3.
THis needs to be something more like:
     if (err1 == noErr && err2 == noErr && err3 == noErr &&
        (major > 10 || ((major == 10 && minor > 6) ||
        (major == 10 && minor == 6 && bugfix >=3)))
Comment 26 User image Bill Gianopoulos [:WG9s] 2011-01-28 07:18:30 PST
I might be simpler to do something like 

      (major * 1000000 + minor *1000 + bugfix) >= 10006003
Comment 27 User image Jeff Muizelaar [:jrmuizel] 2011-01-28 07:44:39 PST
(In reply to comment #26)
> I might be simpler to do something like 
> 
>       (major * 1000000 + minor *1000 + bugfix) >= 10006003

Indeed. I think this is the best option.
Comment 28 User image Bill Gianopoulos [:WG9s] 2011-01-28 13:41:27 PST
Hmm. Is the patch just checked in for Bug 628384 going to override this fix?
Comment 29 User image Joe Drew (not getting mail) 2011-01-28 13:45:21 PST
Yes, but that's ok; our test machines should always be in a working state.
Comment 30 User image Joe Drew (not getting mail) 2011-01-28 13:50:37 PST
Created attachment 507987 [details] [diff] [review]
v2: only apply our blocking logic on 10.6

Bill's right that the logic before was wrong. I don't see any reason why we should apply our logic to non-10.6 OSes, though, because 10.5 had no problems, and we know nothing about > 10.6. So let's tweak this a little.
Comment 31 User image Bill Gianopoulos [:WG9s] 2011-01-28 13:54:56 PST
(In reply to comment #29)
> Yes, but that's ok; our test machines should always be in a working state.

I thought the point of this patch was to fix the orange on the test machines by blacklisting them because they were an old OS.
Comment 32 User image Joe Drew (not getting mail) 2011-01-28 13:59:11 PST
No, we're going to fix our test machines by updating them to 10.6.6 (bug 629509). This is to fix the few users who are still on 10.6.0, 10.6.1, and 10.6.2.
Comment 33 User image Bill Gianopoulos [:WG9s] 2011-01-28 14:07:54 PST
(In reply to comment #32)
> No, we're going to fix our test machines by updating them to 10.6.6 (bug
> 629509). This is to fix the few users who are still on 10.6.0, 10.6.1, and
> 10.6.2.

OK.  Thanks for the clarification.
Comment 34 User image Jeff Muizelaar [:jrmuizel] 2011-01-28 14:10:20 PST
Comment on attachment 507987 [details] [diff] [review]
v2: only apply our blocking logic on 10.6

Maybe this should've been done in the blacklisting code. That would make it easier to expose the reason for the block to about:support.
Comment 35 User image Joe Drew (not getting mail) 2011-02-03 13:16:54 PST
Created attachment 509545 [details] [diff] [review]
Re-enable tests disabled earlier

Now that we have 10.6.2 and below blocked for OpenGL, we can re-enable the tests we disabled because the OS X bug caused them to fail.
Comment 37 User image pal-moz 2011-02-08 14:38:49 PST
is this included not in beta 11, but in beta 12 ?
Comment 38 User image Daniel Holbert [:dholbert] (vacation, returning 2/27) 2011-02-08 14:51:58 PST
I believe you're correct -- at least at one point, the beta11 was tagged at
  http://hg.mozilla.org/mozilla-central/rev/49c3ca45a79f
which is older than the csets in Comment 36 (based on |hg log|).

Note You need to log in before you can comment on or make changes to this bug.