Closed Bug 549672 Opened 14 years ago Closed 14 years ago

Linux ix machines failing many unit tests every time

Categories

(Release Engineering :: General, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Assigned: bhearsum)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

I don't have a great way of finding every single time that an -ix- machine has run mochitest-other on Linux, but in the last 24 hours, every single one I've found has failed:

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1267548048.1267549160.19480.gz
Linux mozilla-central opt test mochitest-other on 2010/03/02 08:40:48
s: mv-moz2-linux-ix-slave03
7727 ERROR TEST-UNEXPECTED-FAIL | chrome://mochikit/content/chrome/toolkit/content/tests/chrome/test_largemenu.xul | context menu more space above top - got -778, expected -344
7729 ERROR TEST-UNEXPECTED-FAIL | chrome://mochikit/content/chrome/toolkit/content/tests/chrome/test_largemenu.xul | context menu too big either side top
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_354894.js | browser windows while running testOpenCloseRestoreFromPopup (getBrowserState) - Got 2, expected 1
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_354894.js | browser windows after testOpenCloseRestoreFromPopup (getBrowserState) - Got 2, expected 1
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_354894.js | browser windows after testNotificationCount (getBrowserState) - Got 2, expected 1
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_394759.js | oldState in test_purge has 2 windows instead of 1
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_394759.js | number of browser windows after test_purge - Got 2, expected 1
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_394759.js | Found an unexpected browser window at the end of test run
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_394759_privatebrowsing.js | Found an unexpected browser window at the end of test run
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_394759_privatebrowsing.js | Found an unexpected browser window at the end of test run
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_522545.js | Only one browser window should be open eventually - Got 5, expected 1
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_522545.js | Found an unexpected browser window at the end of test run
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_522545.js | Found an unexpected browser window at the end of test run
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_522545.js | Found an unexpected browser window at the end of test run
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_522545.js | Found an unexpected browser window at the end of test run
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_528776.js | number of open browser windows according to getBrowserState - Got 9, expected 1
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_528776.js | number of open browser windows according to getBrowserState - Got 10, expected 2
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_528776.js | number of open browser windows according to getBrowserState - Got 9, expected 1

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1267529363.1267529970.7508.gz
Linux mozilla-central opt test mochitest-other on 2010/03/02 03:29:23
s: mv-moz2-linux-ix-slave10

(same set plus one test_largemenu.xul)

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1267531058.1267533173.20962.gz
Linux mozilla-central debug test mochitest-other on 2010/03/02 03:57:38
s: mv-moz2-linux-ix-slave02
7997 ERROR TEST-UNEXPECTED-FAIL | chrome://mochikit/content/chrome/toolkit/content/tests/chrome/test_titlebar.xul | move window horizontal - got 200, expected 220
(bug 543760, except that hasn't failed for almost a month, and when it did it didn't fail just one horizontal)

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1267512222.1267514419.15196.gz
Linux mozilla-central opt test mochitest-other on 2010/03/01 22:43:42
s: mv-moz2-linux-ix-slave10
(same set as the second)

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1267512921.1267517166.25335.gz
Linux mozilla-central debug test mochitest-other on 2010/03/01 22:55:21
s: mv-moz2-linux-ix-slave05
(the four test_largemenu.xul ones, plus the single test_titlebar.xul one)
IIRC this test is dependent on the screen size. I'm guessing these machines might not have a "fake screen" attached, or their fake screen size is too small?
(In reply to comment #1)
> IIRC this test is dependent on the screen size. I'm guessing these machines
> might not have a "fake screen" attached, or their fake screen size is too
> small?

Phil/Gavin: sorry about this. I'm not sure how this made it past testing in staging environment before we pushed them live. Investigating.
Blocks: 545801
To be paranoid, we're switching these linux builders back to staging, until we figure this out.

Over to bear.
Assignee: nobody → bear
Depends on: 549722
Don't understand where this is coming from. Both VMs and the hardware are running 'Xvfb :2 -screen 0 1280x1024x24' and 'metacity --display :2 --replace' and test with 'DISPLAY=:2' in the environment.
Pulled ix slaves 02,03, 05-10, 12-25 from production to give bear time to get his vpn on. Idle slaves had buildbot stopped, busy slaves will not start buildbot after their current job is done. Renamed buildbot.tac to buildbot.tac.off to effect this. Slave 11 has other problems, bug 546424. Still working on 4.
mv-moz2-linux-ix-slave04 is supposed to be in an reboot loop, for bug 546424 again, and 01 is permanently in staging so thats all of them.
I've done the following mv-moz2-linux-ix-slave02,03,05-08 (as cltbld via ssh):

rm -rf /builds/slave/*/ &
su - -c 'rsync -av --delete ~cltbld/.ssh_staging/ ~cltbld/.ssh/'
# enter root password

sed -i -e "s/production-master.*/staging-master.build.mozilla.org\'/" \
  /builds/slave/buildbot.tac.off
fg

# wait for rm to finish
mv /builds/slave/buildbot.tac{.off,}
buildbot start /builds/slave

Still to do mv-moz2-linux-ix-slave09,10,12-25.
Went ahead and did mv-moz2-linux-ix-slave09,10,12,14-25. That's everything between 01 and 25 except:
* mv-moz2-linux-ix-slave04 and 11 for bug 546424 (grub issues)
* mv-moz2-linux-ix-slave13 I couldn't log in to, perhaps the temp passwords used by the suppliers ?

To do:
* convert slave13 to staging
* figure out why test_largemenu.xul and friends are failing
No longer depends on: 549722
Assignee: bear → bhearsum
(In reply to comment #8)
> * convert slave13 to staging

The filesystem was in rough shape for some reason, actually. It was fine after fscking. It's back in staging now.

> * figure out why test_largemenu.xul and friends are failing

I started looking into this. So far I've run it by hand while watching over VNC. It looks fine from there. The window is exactly the same size as it is on a VM.

I set an alert() to happen shortly after one of the pop-ups in this test and *that* window is the same size on a VM and ix machine, too.

I've also verified that all the same versions of X, GTK, et. al are installed on both, and they are.

At this point it would be really helpful to have some developer support here. I'm not sure how to go about debugging this test.

Neil, if I give you access to an environment where this test fails could you help me debug it?
Is the test being run on a machine with differing screen characteristics - more than one, different resolution, depth, etc? What OS UI is visible (taskbars)?

I can debug it if you'd like.
(In reply to comment #10)
> Is the test being run on a machine with differing screen characteristics - more
> than one, different resolution, depth, etc? What OS UI is visible (taskbars)?

Both the VM (which passes the test) and the full-hardware machine (which fails) are the same. Here's a dump from xdpyinfo:
screen #0:
  dimensions:    1280x1024 pixels (325x260 millimeters)
  resolution:    100x100 dots per inch
  depths (7):    1, 4, 8, 16, 24, 32, 24
  root window id:    0x40
  depth of root window:    24 planes
  number of colormaps:    minimum 1, maximum 1
  default colormap:    0x20
  default number of colormap cells:    256
  preallocated pixels:    black 0, white 16777215
  options:    backing-store NO, save-unders NO
  largest cursor:    1280x1024

> I can debug it if you'd like.

That'd be great -- I'll catch you on IRC to hook you up with access to the machine.
I caught up with Enn over IRC and we're currently working on getting him access to a machine this fails on.
Depends on: 551476
From IRC:
15:16 <Enn> ok, I think the problem is that popups are being moved 
            asyncronously on linux. may be related to bug 472675
15:16 <Enn> works ok if I put some timers in
To sum up the status here:

> http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1267548048.1267549160.19480.gz
> Linux mozilla-central opt test mochitest-other on 2010/03/02 08:40:48
> s: mv-moz2-linux-ix-slave03
> 7727 ERROR TEST-UNEXPECTED-FAIL |
> chrome://mochikit/content/chrome/toolkit/content/tests/chrome/test_largemenu.xul
> | context menu more space above top - got -778, expected -344
> 7729 ERROR TEST-UNEXPECTED-FAIL |
> chrome://mochikit/content/chrome/toolkit/content/tests/chrome/test_largemenu.xul

Working with Enn to find a fix for these tests.

> | context menu too big either side top
> TEST-UNEXPECTED-FAIL |
> chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_354894.js
> | browser windows while running testOpenCloseRestoreFromPopup (getBrowserState)
> - Got 2, expected 1
> TEST-UNEXPECTED-FAIL |
> chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_354894.js
> | browser windows after testOpenCloseRestoreFromPopup (getBrowserState) - Got
> 2, expected 1
> TEST-UNEXPECTED-FAIL |
> chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_354894.js
> | browser windows after testNotificationCount (getBrowserState) - Got 2,
> expected 1
> TEST-UNEXPECTED-FAIL |
> chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_394759.js
> | oldState in test_purge has 2 windows instead of 1
> TEST-UNEXPECTED-FAIL |
> chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_394759.js
> | number of browser windows after test_purge - Got 2, expected 1
> TEST-UNEXPECTED-FAIL |
> chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_394759.js
> | Found an unexpected browser window at the end of test run
> TEST-UNEXPECTED-FAIL |
> chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_394759_privatebrowsing.js
> | Found an unexpected browser window at the end of test run
> TEST-UNEXPECTED-FAIL |
> chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_394759_privatebrowsing.js

All of these files are mentioned in two existing bugs: https://bugzilla.mozilla.org/show_bug.cgi?id=518970 and https://bugzilla.mozilla.org/show_bug.cgi?id=528219.

> | Found an unexpected browser window at the end of test run
> TEST-UNEXPECTED-FAIL |
> chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_522545.js
> | Only one browser window should be open eventually - Got 5, expected 1
> TEST-UNEXPECTED-FAIL |
> chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_522545.js
> | Found an unexpected browser window at the end of test run
> TEST-UNEXPECTED-FAIL |
> chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_522545.js
> | Found an unexpected browser window at the end of test run
> TEST-UNEXPECTED-FAIL |
> chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_522545.js
> | Found an unexpected browser window at the end of test run
> TEST-UNEXPECTED-FAIL |
> chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_522545.js

This file has other bugs filed on it, some of which talk about timing problems. Might be the case for this, too:
https://bugzilla.mozilla.org/show_bug.cgi?id=528699
https://bugzilla.mozilla.org/show_bug.cgi?id=537061

> | Found an unexpected browser window at the end of test run
> TEST-UNEXPECTED-FAIL |
> chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_528776.js
> | number of open browser windows according to getBrowserState - Got 9, expected
> 1
> TEST-UNEXPECTED-FAIL |
> chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_528776.js
> | number of open browser windows according to getBrowserState - Got 10,
> expected 2
> TEST-UNEXPECTED-FAIL |
> chrome://mochikit/content/browser/browser/components/sessionstore/test/browser/browser_528776.js
> | number of open browser windows according to getBrowserState - Got 9, expected
> 1

Can't find anything on this one.
Summary: Linux ix machines failing test_largemenu.xul and others every time → Linux ix machines failing many unit tests every time
One failure chunk here is bug 518970.  This is perfect, in that we now have a 100% reproducible failure here.  :-)
Depends on: 518970
(In reply to comment #15)
> One failure chunk here is bug 518970.  This is perfect, in that we now have a
> 100% reproducible failure here.  :-)

I found and fixed the problem with bug 518970.  That part won't be an issue here after my patch in that bug lands.
(In reply to comment #17)
> Created an attachment (id=432160) [details]
> fix largemenu test by waiting for window to move in a loop.

This patch seems to fix the failure
Attachment #432160 - Flags: checked-in+
Comment on attachment 432160 [details] [diff] [review]
fix largemenu test by waiting for window to move in a loop.

1.9.2 landing: changeset:   33731:ddc086030f76

m-c landing: changeset:   39361:e0d293fe8408


Need to backport to 1.9.1 still, this patch didn't apply cleanly there.
Ben, can you please grab a fresh build from m-c and see if it passes all the tests on this machine now?
(In reply to comment #20)
> Ben, can you please grab a fresh build from m-c and see if it passes all the
> tests on this machine now?

Will do in a bit
(In reply to comment #19)
> (From update of attachment 432160 [details] [diff] [review])
> 1.9.2 landing: changeset:   33731:ddc086030f76
> 
> m-c landing: changeset:   39361:e0d293fe8408
> 
> 
> Need to backport to 1.9.1 still, this patch didn't apply cleanly there.

Turns out we don't fail on 1.9.1, so we don't need to bother with this.



Ehsan, chrome and browser-chrome tests fully pass on m-c now!
So, should this be FIXED now?
(In reply to comment #23)
> So, should this be FIXED now?

I'm going to wait until the patch in https://bugzilla.mozilla.org/show_bug.cgi?id=518970 lands on 1.9.1 and 1.9.2 first, actually.
(In reply to comment #24)
> (In reply to comment #23)
> > So, should this be FIXED now?
> 
> I'm going to wait until the patch in
> https://bugzilla.mozilla.org/show_bug.cgi?id=518970 lands on 1.9.1 and 1.9.2
> first, actually.

I landed the patc for bug 518970 on both 1.9.1 and 1.9.2.
we're all done here, then.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Whiteboard: [orange]
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: