Closed Bug 450637 Opened 16 years ago Closed 16 years ago

Single reftest fails on win32 unittest VMs but passes on win32 hardware

Categories

(Core :: Layout, defect)

1.9.0 Branch
x86
Windows Server 2003
defect
Not set
blocker

Tracking

()

VERIFIED FIXED

People

(Reporter: lsblakk, Assigned: jruderman)

References

Details

(Keywords: verified1.9.0.4)

Attachments

(1 file)

On Tuesday August 12th in the morning, we moved the 1.9 unittest boxes over to a new network. Since getting the buildslaves up and running again, the same test has failed repeatedly on all three win32 boxes: REFTEST UNEXPECTED FAIL: file:///E:/slave/trunk_2k3_pgo/mozilla/layout/reftests/bugs/212563-1.html Any information on what could be causing this is appreciated.
This failure is keeping the 1.9 tree closed (I'm closing it now) right before code freeze tomorrow.
Severity: normal → blocker
This was "fixed" by Lukas by bringing back old machines. Do we want to close this or keep investigating the cause later?
What was the rest of the failure log for that test? Like the two data: URIs?
This is still failing on fx-win32-1.9-slave09 though - the pgo box. So unless we can switch that one to a physical box and see if it stops happening, this bug is still valid.
Assignee: nobody → jruderman
Jesse and I discussed this, and I believe it's a race condition in the test that's related to it being a PGO and run in a VM. Reftest takes the snapshot onload of the iframe, but in this test, the iframe document rewrites the iframe onload, so I think the behavior might be undefined. I think we should change the test to bubble up an event from the iframe when it's finished rewriting the frame, and the parent can use the "reftest-wait" class and remove it upon receiving the event from the frame.
Why would that cause a race? The onload for the iframe should fire, the rewriting happen, then the snapshot be taken. Does the rewriting start new loads or something?
The parent document onload is what triggers reftest to take the snapshot. Clearly something bad happens here, as the failure mode is that the test document's snapshot comes out with an empty iframe.
bz, the onload "rewriting" is a document.write. Does that count as a "new load" that's allowed to happen asynchronously?
Hmm. document.write might be async in some cases in terms of the DOM appearing (even in cases when it's not writing out <script>), yes. I'm actually not sure. Blake, do you know? Ted, I realize something is going wrong. I just want to make sure it's not a bug that webpages would run into.
Done: * Checked that it still crashes Mozilla 1.6 Alpha 1. * Checked that the reftest still passes on my machine. To do: * Make sure that "document.write that blows away the document" is allowed to be asynchronous (bz/mrbkap). * Find out whether this change actually fixes the problem on the PGO box.
Something fun to add to the mix: On unittest 1.9 staging - See Tinderbox tree UnitTest - the windows vm fx-win32-1.9-slave07 has also been failing the single reftest, but the other windows vm fx-win32-1.9-slave08 doing pgo builds, is not. These two VMs were created at the same time and afaik have the same configuration.
fx-win32-1.9-slave07 just passed this reftest on a build starting at 9am PDT on staging-master
that was an anomaly - it's back to failing again.
FWIW, the clobber build (from bug 454696) at 2008/09/10 16:27 was green (ie passed reftest for bug 212563), but the subsequent three runs all failed it. Could it matter if dist/bin is blown away each run ?
(In reply to comment #11) > Created an attachment (id=334355) [details] > patch: change the reftest to use reftest-wait > > Done: > * Checked that it still crashes Mozilla 1.6 Alpha 1. > * Checked that the reftest still passes on my machine. > > To do: > * Make sure that "document.write that blows away the document" is allowed to be > asynchronous (bz/mrbkap). > * Find out whether this change actually fixes the problem on the PGO box. Any progress here? This machine is pretty much perma-orange on the Firefox 3.0 tinderbox.
I think mrbkap will look soon.
bz and I talked on IRC the other day. We decided that this fix is correct (we shouldn't be relying on the parser not to interrupt itself during document.write).
Changed summary to reflect that this is occurring on both 1.9 and mozilla-central on VMs only.
Summary: Single reftest fail on win32 unittest 1.9 boxes since network switch → Single reftest fail on win32 unittest VMs
(In reply to comment #19) > bz and I talked on IRC the other day. We decided that this fix is correct (we > shouldn't be relying on the parser not to interrupt itself during > document.write). hi; Sorry, I didnt follow. Are you saying that this test failure is expected? Note: we are seeing consistent failure on VMs, on both 1.9 and m-c... and consistent passing on physical hardwre, on both 1.9 and m-c. For background, note that we are migrating a bunch of machines from QA network to Build network, and are currently blocked mid-migration trying to figure out what is causing this test failure. Hence the urgency.
Summary: Single reftest fail on win32 unittest VMs → Single reftest fails on win32 unittest VMs but passes on win32 hardware
(In reply to comment #21) > Sorry, I didnt follow. Are you saying that this test failure is expected? Yes.
I checked in the test change (hg & cvs). I hope it fixes the orange!
Status: NEW → RESOLVED
Closed: 16 years ago
Keywords: fixed1.9.0.3
Resolution: --- → FIXED
So what happens now is that the first build goes green and then subsequent builds continue to fail this reftest - on both 1.9 and m-c, still only an issue on VMs. Should have more results on this tomorrow morning after a few builds have cycled to see if it ever passes again without intervention.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whoah: REFTEST UNEXPECTED FAIL (LOADING): file:///E:/slave/trunk_2k3_pgo/mozilla/layout/reftests/bugs/212563-1.html That's not at all what I was expecting, but I guess it's consistent with the "foo in iframe" vs "blank iframe" reftest failures we saw before my test change went in. I guess this timing issue somehow manages to stop one of the onloads from firing. I added some dumps so we can at least figure out which one. I hope the unit test build machines on which this test fails have dump enabled (debug builds or browser.dom.window.dump.enabled). New theories: * the innermost frame's onload fires before the middle frame finishes HTML-parsing, causing the document.write() to append to the <frameset> document instead of replace it with an <html> document. * the innermost frame's onload 'd.close();' call fails because of xpconnect being weird.
I wish reftest "loading" errors said whether the test didn't finish loading, the test continued to have "reftest-wait", or both.
Theory 1 sounds pretty plausible to me, for what it's worth.
I wasn't able to make a test that was both deterministic and capable of crashing Mozilla 1.6 Alpha 1. So instead I split it into two tests, one of which is deterministic, and one of which crashes Mozilla 1.6 Alpha 1 and happens to pass regardless of who wins the race. Fun fact: the deterministic testcase fails in Mozilla 1.6 Alpha 1 for an entirely different reason, involving privileges and/or xpconnect being weird ;)
I think that greened it.
Status: REOPENED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
so much green it hurts the eyes. thanks Jesse!
Status: RESOLVED → VERIFIED
"its not easy being green"... but sure looks very very very nice. Thanks Jesse!
(In reply to comment #31) > so much green it hurts the eyes. thanks Jesse! I take it, Lukas, that I can mark this as verified by you for 1.9.0.4?
The beautiful greenness of the Firefox3.0 tree (minus bug 460474) is good enough for me!
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: