Closed Bug 427142 Opened 17 years ago Closed 17 years ago

sporadic test_393498.js failure on bm-win2k3-pgo01 because of VM timer issues

Categories

(Firefox :: Bookmarks & History, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: dietrich, Assigned: mak)

References

()

Details

(Keywords: fixed1.9.0.2, intermittent-failure)

Attachments

(3 files, 3 obsolete files)

well, one test failure at this point. it went green next run. there were no recent checkins that looked anything related. possibly related to bug 381240. NEXT ERROR ../../../../_tests/xpcshell-simple/test_places/bookmarks/test_393498.js: FAIL ../../../../_tests/xpcshell-simple/test_places/bookmarks/test_393498.js.log: >>>>>>> *** test pending *** exiting *** CHECK FAILED: 1207338863116625 != 1207338863116625 JS frame :: d:/slave/trunk_2k3_pgo/mozilla/tools/test-harness/xpcshell-simple/head.js :: do_throw :: line 99 JS frame :: d:/slave/trunk_2k3_pgo/mozilla/tools/test-harness/xpcshell-simple/head.js :: do_check_neq :: line 109 JS frame :: ../../../../_tests/xpcshell-simple/test_places/bookmarks/test_393498.js :: run_test :: line 129 JS frame :: d:/slave/trunk_2k3_pgo/mozilla/tools/test-harness/xpcshell-simple/tail.js :: _execute_test :: line 41 JS frame :: d:/slave/trunk_2k3_pgo/mozilla/tools/test-harness/xpcshell-simple/execute_test.js :: <TOP_LEVEL> :: line 38 2147500036 *** FAIL ***
I would like to know if this is actually related to bug 381240, because 381240 is a non-blocking p4 bug and I'm wondering if we should continue to block putting the PGO box back in to production for this issue.
Over about 60 runs I've seen this issue ~12 times on two separate PGO boxes. This issue is PGO specific as we haven't seen it on the win2k3 unittest machines once.
I'm seeing the same thing on my virtual build machine, sometimes the test pass, sometimes not. same exact test / line. XP on Virtual PC 2007, with latest Windows Server 2008 SDK and mozillabuild.
mak77, are you using PGO enabled builds?
So, the unittest box on the Firefox tinderbox is not a VM, iirc. (robcee can verify that) This might just be a VM timing issue?
(In reply to comment #4) > mak77, are you using PGO enabled builds? no, i'm using the standard build environment, with these options ac_add_options --enable-debugger-info-modules=yes ac_add_options --disable-installer ac_add_options --disable-accessibility ac_add_options --disable-activex ac_add_options --disable-activex-scripting ac_add_options --disable-libxul
Ok, so we now know that it's not a PGO issue but is a VM issue. Can we get a developer owner for this bug to fix the timing in the test so that it passes more reliably on virtual machines?
Assignee: nobody → dietrich
Attached patch delay (obsolete) — Splinter Review
try delaying verification. Mikeal and Marco, can you please try this locally?
(In reply to comment #8) > Created an attachment (id=314727) [details] > Mikeal and Marco, can you please try this locally? 4 runs: 3 PASS, 1 FAIL
Attached patch duplicationSplinter Review
well, this is NOT a fix, but i'm not able to reproduce the fail with this in... the *hack* is duplicating the call to setItemTitle in the test. without this i fail at least 1 time on 10, with the duplication i've not been able to fail (50 consecutive runs without failing). At least will reduce a lot the failing statistics... Maybe could be useful to find the real cause... As a bonus i did a refactoring of SetItemTitle to be faster (one query is enough and transaction is implicit). We could check-in this as a temporary hack to reduce test failing and then look for the real fix
A failure hasn't been observed in 12 hours on both PGO boxes. Doesn't mean this issue is gone, just an observation.
We are, optimistically, thinking this will be fixed by EOD. We are scheduling production downtime for tomorrow at 10am PST to put the PGO untittest box back on . Are we going to take this *hack*, or are we still waiting on a different fix?
Comment on attachment 314845 [details] [diff] [review] duplication r=me on the test change. please file a different bug on the perf change. We should take this change, and keep this bug open to followup and resolve the root cause of the failures.
Attachment #314845 - Flags: review+
a=mconnor over IM. Checking in toolkit/components/places/tests/bookmarks/test_393498.js; /cvsroot/mozilla/toolkit/components/places/tests/bookmarks/test_393498.js,v <-- test_393498.js new revision: 1.2; previous revision: 1.1 done
Assignee: dietrich → mak77
Observed failure in test_393498.js from a checkout at ~9am PST on Friday April 11th, 2008. http://tinderbox.mozilla.org/showlog.cgi?log=UnitTest/1207931947.1207940927.15739.gz#err0
still, is the fail ratio better now? we should have reduced the possibility of a fail... not 0 though
I've built up a test with a loop to get some statistics about this fault. on 1000 runs i got about: - 100 faults with 1 setItemTitle call, so failing about 1/10 of times - 5 faults with 2 setItemTitle calls (fault ~ 1/100) - 0/1 fault with 3 setItemTitle calls (fault ~ 1/1000) - with n setItemTitle calls fault ~ 1/10^n PR_Now() in these cases is returning the same value returned by getItemTitle (so the value we previously saved to the db, that was another PR_Now()). So definately appear a PR_Now() bug on VM timers.
Can we make the test, after its setTimeout, check that the Date.now() is different from what it was before the setTimeout, and if it isn't, do the setTimeout again? Would that fix this?
Summary: sporadic test failure on bm-win2k3-pgo01 → sporadic test_393498.js failure on bm-win2k3-pgo01 because of VM timer issues
So we did contact VMware technical support (see bug 445580), and they also brought up the point that time adjustments caused by NTP can cause things like this to happen. So I think we should just fix the test to do what comment 19 suggests (and perhaps even check > rather than !=).
At one point in time it was known to not run ntp but I don't know how the Windows VMs are setup. All the ESX hosts are configured to run ntp in the service console. Can someone from RelEng check the settings in the VM?
bm-win2k3-pgo01 / qm-win2k3-pgo01 has out-of-date vmware tools. The vmx config is set correctly: tools.syncTime = "TRUE" Don't know if Windows is still trying to do it's own time sync (and I don't know how to check that). Either way, those tools should be updated.
(In reply to comment #19) > Can we make the test, after its setTimeout, check that the Date.now() is > different from what it was before the setTimeout, and if it isn't, do the > setTimeout again? Would that fix this? I think we should provide a helper function that does this in the test harness, so we can fix any other tests that make assumptions like this.
(In reply to comment #19) > Can we make the test, after its setTimeout, check that the Date.now() is > different from what it was before the setTimeout, and if it isn't, do the > setTimeout again? Would that fix this? we are saving now() in the places db in the backend, and then reading that, so i doubt that it would be fixed that way, instead i was thinking we could maybe manually change the lastModified value for that bookmark in the past before doing the lastModified update, so we will check that it has changed and skip the problem.
Attached patch set lm in the past patch (obsolete) — Splinter Review
this should reduce fail to zero (and from my testing does) for this specific test without nullifying its usefulness (we still check that the lastModified get changed). I suggest taking this specific fix for this testcase and find a more general way for handling the timing problem in VMs in bug 445580.
Attachment #330538 - Flags: review?(dietrich)
Comment on attachment 330538 [details] [diff] [review] set lm in the past patch >? toolkit/components/places/tests/unit/test_433317_query_title_update.js >? toolkit/components/places/tests/unit/test_now.js >Index: toolkit/components/places/tests/bookmarks/test_393498.js >=================================================================== >RCS file: /cvsroot/mozilla/toolkit/components/places/tests/bookmarks/test_393498.js,v >retrieving revision 1.2 >diff -u -8 -p -r1.2 test_393498.js >--- toolkit/components/places/tests/bookmarks/test_393498.js 10 Apr 2008 20:06:11 -0000 1.2 >+++ toolkit/components/places/tests/bookmarks/test_393498.js 21 Jul 2008 09:13:32 -0000 >@@ -117,23 +117,24 @@ function run_test() { > do_check_eq(rootNode.childCount, 1); > var childNode = rootNode.getChild(0); > > // confirm current dates match node properties > do_check_eq(bmsvc.getItemDateAdded(bookmarkId), childNode.dateAdded); > do_check_eq(bmsvc.getItemLastModified(bookmarkId), childNode.lastModified); > > // test live update of lastModified caused by other changes: >+ // We set lastModified 1ms in the past to workaround a timing bug on >+ // virtual machines, see bug 427142 for details. >+ var pastDate = Date.now() * 1000 - 1; >+ bmsvc.setItemLastModified(bookmarkId, pastDate); > // set title (causing update of last modified) > var oldLastModified = bmsvc.getItemLastModified(bookmarkId); >- // This double call to setItemTitle is a temporary hack to workaround a >- // timing bug on virtual machines. See bug 427142 for details. >- bmsvc.setItemTitle(bookmarkId, "Google"); > bmsvc.setItemTitle(bookmarkId, "Google"); >- // test that lm is updated >+ // test that lastModified is updated > do_check_neq(oldLastModified, childNode.lastModified); should change this to be more specific, testing that oldLastModified is < childNode.lastModified. r=me, thanks.
Attachment #330538 - Flags: review?(dietrich) → review+
Attached patch fixed commentSplinter Review
for sure, doing stricter check
Attachment #330538 - Attachment is obsolete: true
Keywords: checkin-needed
Pushed to mozilla-central: http://hg.mozilla.org/index.cgi/mozilla-central/rev/d8fbfb07f444 I'll check into 1.9.0 CVS in a minute. Thanks for the fix!
Status: NEW → RESOLVED
Closed: 17 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Checking in toolkit/components/places/tests/bookmarks/test_393498.js; /cvsroot/mozilla/toolkit/components/places/tests/bookmarks/test_393498.js,v <-- test_393498.js new revision: 1.3; previous revision: 1.2 done
Keywords: fixed1.9.0.2
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch same trick as above (obsolete) — Splinter Review
This particular FAIL does not happen on my VM, while previous was frequent. Hwv this is the same trick as above, set lastModified in the past and then check.
Attachment #332865 - Flags: review?(dietrich)
Comment on attachment 332865 [details] [diff] [review] same trick as above >diff --git a/toolkit/components/places/tests/bookmarks/test_393498.js b/toolkit/components/places/tests/bookmarks/test_393498.js >--- a/toolkit/components/places/tests/bookmarks/test_393498.js >+++ b/toolkit/components/places/tests/bookmarks/test_393498.js >@@ -75,17 +75,19 @@ bmsvc.addObserver(observer, false); > // main > function run_test() { > var testFolder = bmsvc.createFolder(bmsvc.placesRoot, "test Folder", > bmsvc.DEFAULT_INDEX); > var bookmarkId = bmsvc.insertBookmark(testFolder, uri("http://google.com/"), > bmsvc.DEFAULT_INDEX, ""); > do_check_true(observer.itemChangedProperty == null); > >- var newDate = Date.now() * 1000; >+ // We set lastModified 1ms in the past to workaround a timing bug on >+ // virtual machines, see bug 427142 for details. >+ var newDate = Date.now() * 1000 - 1; wouldn't this actually set it to 1 microsecond in the past? r=me with that fixed.
Attachment #332865 - Flags: review?(dietrich) → review+
Attached patch fixed commentSplinter Review
Attachment #332865 - Attachment is obsolete: true
Attachment #314727 - Attachment is obsolete: true
Keywords: checkin-needed
and CVS: Checking in toolkit/components/places/tests/bookmarks/test_393498.js; /cvsroot/mozilla/toolkit/components/places/tests/bookmarks/test_393498.js,v <-- test_393498.js new revision: 1.4; previous revision: 1.3 done
Status: REOPENED → RESOLVED
Closed: 17 years ago17 years ago
Resolution: --- → FIXED
Thanks again for the fixes!
Whiteboard: [orange]
Bug 451915 - move Firefox/Places bugs to Firefox/Bookmarks and History. Remove all bugspam from this move by filtering for the string "places-to-b-and-h". In Thunderbird 3.0b, you do that as follows: Tools | Message Filters Make sure the correct account is selected. Click "New" Conditions: Body contains places-to-b-and-h Change the action to "Delete Message". Select "Manually Run" from the dropdown at the top. Click OK. Select the filter in the list, make sure "Inbox" is selected at the bottom, and click "Run Now". This should delete all the bugspam. You can then delete the filter. Gerv
Component: Places → Bookmarks & History
QA Contact: places → bookmarks
Whiteboard: [orange]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: