xpcshell: intermittent "test_async_history_api.js | 2 == 3" or "0 == 3"

VERIFIED FIXED in mozilla2.0

Status

()

Toolkit
Places
--
major
VERIFIED FIXED
7 years ago
5 years ago

People

(Reporter: mbrubeck, Assigned: sdwilsh)

Tracking

({intermittent-failure})

Trunk
mozilla2.0
intermittent-failure
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [See comment 12] , URL)

Attachments

(1 attachment, 2 obsolete attachments)

(Reporter)

Description

7 years ago
+++ This bug was initially created as a clone of Bug #629979 +++

This is almost the same failure as Bug 629979, but since 2011-02-03 it's appearing intermittenly in Firefox on all platforms.  Latest:

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1297211911.1297212896.7128.gz

TEST-INFO | (xpcshell/head.js) | test 2 finished
TEST-PASS | /home/cltbld/talos-slave/test/build/xpcshell/tests/toolkit/components/places/tests/unit/test_async_history_api.js | [null : 578] true == true
TEST-PASS | /home/cltbld/talos-slave/test/build/xpcshell/tests/toolkit/components/places/tests/unit/test_async_history_api.js | [null : 578] false == false
TEST-PASS | /home/cltbld/talos-slave/test/build/xpcshell/tests/toolkit/components/places/tests/unit/test_async_history_api.js | [null : 585] places-updatePlaces-complete == places-updatePlaces-complete
TEST-UNEXPECTED-FAIL | /home/cltbld/talos-slave/test/build/xpcshell/tests/toolkit/components/places/tests/unit/test_async_history_api.js | 2 == 3 - See following stack:
JS frame :: /home/cltbld/talos-slave/test/build/xpcshell/head.js :: do_throw :: line 439
JS frame :: /home/cltbld/talos-slave/test/build/xpcshell/head.js :: do_check_eq :: line 491
JS frame :: /home/cltbld/talos-slave/test/build/xpcshell/tests/toolkit/components/places/tests/unit/test_async_history_api.js :: <TOP_LEVEL> :: line 586
TEST-INFO | (xpcshell/head.js) | exiting test

See comments in bug 629979 for some previous logs.
Summary: xpcshell: intermittent "test_async_history_api.js | 2 == 3" → [Firefox] xpcshell: intermittent "test_async_history_api.js | 2 == 3"
Whiteboard: [orange]
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
On Thunderbird tree this failure has now become pretty much 100% perma orange. It seems to have started around the same time as this bug started.

Although we're seeing it on Windows and not Linux (both our Windows and Linux builders are VMs).

Given that this is also more severe on Firefox, I'm promoting this to major, this one is getting annoying.
Severity: normal → major
(moving to toolkit/places which seems more appropriate)
Component: Bookmarks & History → Places
Product: Firefox → Toolkit
QA Contact: bookmarks → places
Inspection strongly suggests that test_observer_topic_dispatched_when_complete is observing the TOPIC_UPDATEPLACES_COMPLETE from the prior test, test_unstored_sessionId_ignored, which makes no effort to make sure that its TOPIC_UPDATEPLACES_COMPLETE is consumed.

The number ends up being 2 because the TRANSITION_EMBED event's callback is synchronously dispatched to the main thread during the call to updatePlaces by StoreAndNotifyEmbedVisit.  Same deal with the data url (but done by canAddUri).

So the event stream related to the main thread looks like this:

-1) The prior test calls updatePlaces which enqueues the async db operation and the bouncy notification operation to the async thread.
(the async db operation runs on the async thread)
(the async db operation callback is enqueued on the main thread...)

1) the prior test's callback is invoked, causing the next test to be queued.
2) the test logic gets invoked, calling updatePlaces, enqueueing 2 callback invocations on the main thread and one thing on the async thread.
(at some point after point 2, the notification bouncer gets run on the async thread and enqueued on the main thread)
3) callback invoc 1 fires
4) callback invoc 2 fires
5) the notification from the previous test fires, breaking things.

I am not going to provide a patch for this, but hope the places team finds the analysis useful.
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
See also Bug 633652, filed on a failure in the same test but with "0 == 3"
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Assignee)

Comment 21

7 years ago
(In reply to comment #12)
> I am not going to provide a patch for this, but hope the places team finds the
> analysis useful.
This is what I was worried was happening.  Thanks for looking into this Andrew!
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Created attachment 513993 [details] [diff] [review]
Workaround

Workaround for the issue - put the test that relies on observers first, so that it doesn't get caught up with previous tests.

This has passed at least 9 times on Thunderbird's try server, and we've been seeing it pretty much constantly on our boxes.

I suspect this doesn't actually fix this bug, but just works around it for now, hence the added comment in the patch.
Attachment #513993 - Flags: review?(sdwilsh)
Created attachment 513994 [details] [diff] [review]
Workaround

Get the right patch version this time.
Attachment #513994 - Flags: review?(sdwilsh)
Attachment #513993 - Attachment is obsolete: true
Attachment #513993 - Flags: review?(sdwilsh)
Whiteboard: [orange] → [See comment 12] [orange]
I've reproduced this locally, the analysis in comment 12 seems fine.
Rather than moving the test and hitting this again in future, I'd prefer changing run_next_test to wait async stuff, I'm trying a local patch with that change.
(Assignee)

Comment 70

7 years ago
(In reply to comment #69)
> I've reproduced this locally, the analysis in comment 12 seems fine.
> Rather than moving the test and hitting this again in future, I'd prefer
> changing run_next_test to wait async stuff, I'm trying a local patch with that
> change.
We wanted to move run_next_test up to head.js though (so anyone writing an async test can use the helper method).

Updated

7 years ago
Summary: [Firefox] xpcshell: intermittent "test_async_history_api.js | 2 == 3" → [Firefox] xpcshell: intermittent "test_async_history_api.js | 2 == 3" or "0 == 3"

Updated

7 years ago
Duplicate of this bug: 633652
(Assignee)

Comment 72

7 years ago
Created attachment 514218 [details] [diff] [review]
v1.0
[Checked in: Comment 75]

This is the proper fix.  We wait for all background thread work to be done before running the next test.
Assignee: nobody → sdwilsh
Attachment #513994 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #514218 - Flags: review?(mak77)
Attachment #513994 - Flags: review?(sdwilsh)
Comment on attachment 514218 [details] [diff] [review]
v1.0
[Checked in: Comment 75]

discussed the various approaches on IRC, this is fine since somebody in the team is always going to review additional tests here.

Regarding if this method is going to work, I've locally edited run_next_test to always use waitForAsyncUpdates, and made a loop repeating the 2 "culprit" tests hundreds of times. I always failed with 0==3 or 2==3 without the fix after some loop, always passed otherwise. So it should work pretty fine.

Thanks to asuth for analysis and standard8 for the tested workaround.
Attachment #514218 - Flags: review?(mak77) → review+
(Assignee)

Updated

7 years ago
Keywords: checkin-needed
Comment hidden (Treeherder Robot)
Comment on attachment 514218 [details] [diff] [review]
v1.0
[Checked in: Comment 75]

http://hg.mozilla.org/mozilla-central/rev/d39a263d7f14
Attachment #514218 - Attachment description: v1.0 → v1.0 [Checked in: Comment 75]
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
status2.0: ? → ---
Flags: in-testsuite+
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.0
Summary: [Firefox] xpcshell: intermittent "test_async_history_api.js | 2 == 3" or "0 == 3" → xpcshell: intermittent "test_async_history_api.js | 2 == 3" or "0 == 3"
Firefox is still green, Windows Thunderbird perma-orange seems fixed, Windows SeaMonkey p-o should be fixed too...

V.Fixed.
Status: RESOLVED → VERIFIED
Keywords: intermittent-failure
Whiteboard: [See comment 12] [orange] → [See comment 12]
You need to log in before you can comment on or make changes to this bug.