Closed Bug 820489 Opened 7 years ago Closed 7 years ago

Intermittent browser/base/content/test/browser_social_multiprovider.js | uncaught exception - TypeError: navigator.mozSocial is undefined at https://test1.example.com/browser/browser/base/content/test/social_sidebar.html?provider2:7

Categories

(Firefox Graveyard :: SocialAPI, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED
Firefox 20

People

(Reporter: ehsan, Assigned: Gavin)

References

Details

(Keywords: intermittent-failure)

Attachments

(4 files, 3 obsolete files)

(Expanding on comment 0)

Rev3 Fedora 12x64 birch opt test mochitest-browser-chrome on 2012-12-11 10:06:36 PST for push 43df2c7577bd

slave: talos-r3-fed64-053

{
TEST-PASS | chrome://mochitests/content/browser/browser/base/content/test/browser_social_multiprovider.js | username name matches provider profile
TEST-PASS | chrome://mochitests/content/browser/browser/base/content/test/browser_social_multiprovider.js | username name matches provider profile
TEST-PASS | chrome://mochitests/content/browser/browser/base/content/test/browser_social_multiprovider.js | side bar URL is set
TEST-INFO | chrome://mochitests/content/browser/browser/base/content/test/browser_social_multiprovider.js | sub-test testProviderSwitch complete
TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/base/content/test/browser_social_multiprovider.js | uncaught exception - TypeError: navigator.mozSocial is undefined at https://test1.example.com/browser/browser/base/content/test/social_sidebar.html?provider2:7
Stack trace:
    JS frame :: chrome://mochikit/content/tests/SimpleTest/SimpleTest.js :: simpletestOnerror :: line 1065
    native frame :: <unknown filename> :: <TOP_LEVEL> :: line 0
}
Blocks: 809694
Hardware: x86 → x86_64
Summary: Intermittent failure in browser/base/content/test/browser_social_multiprovider.js | uncaught exception - TypeError: navigator.mozSocial is undefined at https://test1.example.com/browser/browser/base/content/test/social_sidebar.html?provider2:7 → Intermittent browser/base/content/test/browser_social_multiprovider.js | uncaught exception - TypeError: navigator.mozSocial is undefined at https://test1.example.com/browser/browser/base/content/test/social_sidebar.html?provider2:7
This seems the most likely reason for the intermittent orange.
Assignee: nobody → mhammond
Attachment #691110 - Flags: review?(gavin.sharp)
Attachment #691110 - Flags: review?(gavin.sharp) → review+
https://hg.mozilla.org/mozilla-central/rev/481814c74277
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 20
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Shane points out that this test isn't waiting for the sidebar to load before doing it's thing and then disabling social, so we can get into a state where we disable social (and therefore MozSocialAPI) right before the sidebar loads - so it fails to find mozSocial when it does.

I thought unloadSidebar would immediately stop existing loads, but it just sets "src" on the frame, and reading through some frameloader code it appears that that starts the new load asynchronously. We could probably also fix that by explicitly calling "stop()" on the browser.
Attached patch patchSplinter Review
I thought about using a message listener like the other tests, but this seemed simpler.
Assignee: mhammond → gavin.sharp
Attachment #691110 - Attachment is obsolete: true
Status: REOPENED → ASSIGNED
Attachment #691552 - Flags: review?(mixedpuppy)
Attachment #691552 - Flags: review?(mhammond)
Comment on attachment 691552 [details] [diff] [review]
patch

Review of attachment 691552 [details] [diff] [review]:
-----------------------------------------------------------------

untested but LGTM
Attachment #691552 - Flags: review?(mhammond) → review+
Comment on attachment 691552 [details] [diff] [review]
patch

this looks good, hard to know until the tests are run on the servers
Attachment #691552 - Flags: review?(mixedpuppy) → review+
https://hg.mozilla.org/mozilla-central/rev/edb2cfd39554
Status: ASSIGNED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I managed to provoke the sidebar tests into hitting this problem.
Attachment #692085 - Flags: review?(mixedpuppy)
Attachment #692085 - Flags: review?(gavin.sharp)
Attachment #692085 - Flags: review?(gavin.sharp) → review+
As checked in - I fixed a trivial typo in a test comment.

https://hg.mozilla.org/integration/mozilla-inbound/rev/21362f7ca849
Attachment #692085 - Attachment is obsolete: true
Attachment #692085 - Flags: review?(mixedpuppy)
Attachment #692170 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/21362f7ca849
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
The problem is that the "provider-removed" notification is sent asynchronously.  The provider has .enabled set to false, then the notification is "queued".  Before the notification fires, the sidebar load event fires but sees the provider is disabled so doesn't inject the mozSocial object into the content.

The fix is to send the notification synchronously to avoid an event loop spin after the provider is disabled but before the sidebar gets unloaded.

I haven't changed provider-added is I can't see how that would have a similar problem - the provider isn't enabled or set "current" in addProvider().
Attachment #692777 - Flags: review?(gavin.sharp)
After chatting with Gavin on IRC, we decided that instead of doing less async, we'd do more :)  provider.enabled is now set async and as the notification is also done in the async block, the problem is still fixed (ie, there is no event-loop spin between provider.enabled being set and the notification).

I've a "tricky try" run which looks like it is going to say the problem really is solved with this - https://tbpl.mozilla.org/?tree=Try&rev=2c20b4562f05
Attachment #692777 - Attachment is obsolete: true
Attachment #692777 - Flags: review?(gavin.sharp)
Attachment #692802 - Flags: review?(gavin.sharp)
Comment on attachment 692802 [details] [diff] [review]
provider.enabled done async

This patch still leaves the same problem - MozSocialAPI calls SocialService.getProvider which will return null between removeProvider being called and the observer firing.
Attachment #692802 - Flags: review?(gavin.sharp)
Test disabled temporarily, until we have a working fix:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7d931025cfbf
Whiteboard: [test disabled][leave open]
So I thought about this some more, and looked at the debugging output from your try run. I think what's happening is:

1) test ends, we call removeProvider, it schedules disabling/removal (with the latest patch here).
2) the sidebar global is created, which triggers the MozSocialAPI injectController code, which calls getProvider for the origin, which schedules retrieving the provider to check its enabled state.
3) the scheduled removal occurs
4) the scheduled retrieval fails, because of 4) 
5) the sidebar loads, and fails to retrieve mozSocial because 4) caused it to fail.

Without your patch, 4) succeeds, but we don't insert mozSocial because the disabling has already occurred, so almost the same situation.

Overall, the core problem seems to be that the sidebar is loading at all after we've finished the test. Indeed I was wondering why removeProvider was being called before we disabled the entire feature, since the code in runSocialTestWithProvider explicitly disables first (which should immediately unload the sidebar via unloadSidebar). Then I found this:

http://hg.mozilla.org/mozilla-central/annotate/ba26dc1c6267/browser/base/content/test/social/head.js#l82

We apparently forgot to update finishSocialTest in bug 809694 - it looks wrong now that runSocialTestWithProvider supports multiple providers. Since we're relying on the cleanup function to do the cleanup anyways, seems like we should just remove it, and hopefully that would fix this?

I think your latest patch (attachment 692802 [details] [diff] [review]) is still valuable, despite it not fixing this.
Something like this. We can't just remove that "finish callback" and rely on the cleanup function, because we want the removal to complete before finish() actually gets called (that's why this setup is so complicated to begin with).

(It's possible we don't really want that, and could simplify this a bunch, but it seems safest to wait until we've restored things fully before starting a new test.)
Attachment #693029 - Flags: feedback?(mhammond)
(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #68)
> 3) the scheduled removal occurs
> 4) the scheduled retrieval fails, because of 4) 

Er, because of 3), perhaps obviously :)
Attachment #693029 - Attachment is patch: true
Comment on attachment 693029 [details] [diff] [review]
fix runSocialTestWithProvider

excellent catch that it was the removeProvider call *before* resetting Social.enabled which is the root of the problem
Attachment #693029 - Flags: feedback?(mhammond) → review+
Re-enabled the test:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3c01cb463c00
Whiteboard: [test disabled][leave open]
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Oh, bit pre-emptive, given I was looking at comment 74, rather than comment 79, but not worth spamming the blocked bug again by reopening when tomorrow's merge will bring the test re-enable.
(In reply to TinderboxPushlog Robot from comment #82)

Before anyone freaks out too much, this is an old orange from before the patch landed. Phew!
(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #83)
> (In reply to TinderboxPushlog Robot from comment #82)
> 
> Before anyone freaks out too much, this is an old orange from before the
> patch landed. Phew!

Sorry yeah I normally remember to suppress those if the bug was fixed (eg use add comment rather than selecting that failure row).
Product: Firefox → Firefox Graveyard
You need to log in before you can comment on or make changes to this bug.