Closed Bug 1229195 Opened 5 years ago Closed 5 years ago

Intermittent browser_mozLoop_chat.js | Condition didn't pass. -

Categories

(Hello (Loop) :: Client, defect, P2)

Unspecified
macOS
defect
Points:
8

Tracking

(firefox45 wontfix, firefox46 fixed, firefox47 fixed, firefox48 fixed, firefox-esr45 wontfix)

RESOLVED FIXED
mozilla48
Iteration:
47.2 - Feb 22
Tracking Status
firefox45 --- wontfix
firefox46 --- fixed
firefox47 --- fixed
firefox48 --- fixed
firefox-esr45 --- wontfix

People

(Reporter: KWierso, Assigned: mikedeboer)

References

Details

(Keywords: intermittent-failure, Whiteboard: [btpp-fix-now][btpp-follow-up-2016-03-10])

Attachments

(6 files)

https://treeherder.mozilla.org/logviewer.html#?job_id=3473802&repo=b2g-inbound


The first instances of this I'm seeing are on this merge from m-c to b2g-inbound:
https://hg.mozilla.org/integration/b2g-inbound/pushloghtml?changeset=21f92638bb14

I'm seeing a bunch of Loop changes in there. Any ideas, Mike/Mark?
Flags: needinfo?(standard8)
Flags: needinfo?(mdeboer)
This looks like the big move to an add-on, although I can't tell you what might be the exact cause of this issue.
Except that - looking at the logs - the `promiseWaitForCondition` in the 'test_mozLoop_hangupAllChatWindows' test doesn't seem to resolve.

The only way I *think* we can attempt to fix this is to print all the open chatboxes and see why the 'LoopHangupNow' event doesn't close the chat window.
Blocks: 1223573
Flags: needinfo?(mdeboer)
Mike's right - I also can't tell think what the issue would be. I'll tag myself for doing some try pushes.
Assignee: nobody → standard8
Flags: needinfo?(standard8)
Mark, please also see the change I propose to the test in bug 1227539. We're passing the wrong argument to LoopRooms.open, for sure.
Depends on: 1227539
This is either fixed or a lot rarer now. Its a bit hard to say, but I'm going to mark it as fixed. If it continues being an issue then we can get a fresh bug filed.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Mike, this seems to have been started again, or something very similar, since we landed the e10s stuff:

https://treeherder.mozilla.org/#/jobs?repo=fx-team&fromchange=f09ea2712445&selectedJob=7064227

 14:35:38     INFO -  222 INFO Entering test test_mozLoop_hangupAllChatWindows
 14:35:38     INFO -  223 INFO Console message: [JavaScript Warning: "Unknown pseudo-class or pseudo-element '-webkit-input-placeholder'.  Ruleset ignored due to bad selector." {file: "chrome://loop/content/shared/css/conversation.css" line: 1035 column: 31 source: ".text-chat-box > form > input::-webkit-input-placeholder {"}]
 14:35:38     INFO -  224 INFO Console message: [JavaScript Warning: "Unknown pseudo-class or pseudo-element '-ms-input-placeholder'.  Ruleset ignored due to bad selector." {file: "chrome://loop/content/shared/css/conversation.css" line: 1050 column: 30 source: ".text-chat-box > form > input:-ms-input-placeholder {"}]
 14:35:38     INFO -  225 INFO Console message: [JavaScript Warning: "Unknown pseudo-class or pseudo-element 'input-placeholder'.  Ruleset ignored due to bad selector." {file: "chrome://loop/content/shared/css/conversation.css" line: 1055 column: 30 source: ".text-chat-box > form > input:input-placeholder {"}]
 14:35:38     INFO -  226 INFO Console message: [JavaScript Warning: "Unknown pseudo-class or pseudo-element '-webkit-input-placeholder'.  Ruleset ignored due to bad selector." {file: "chrome://loop/content/shared/css/conversation.css" line: 1035 column: 31 source: ".text-chat-box > form > input::-webkit-input-placeholder {"}]
 14:35:38     INFO -  227 INFO Console message: [JavaScript Warning: "Unknown pseudo-class or pseudo-element '-ms-input-placeholder'.  Ruleset ignored due to bad selector." {file: "chrome://loop/content/shared/css/conversation.css" line: 1050 column: 30 source: ".text-chat-box > form > input:-ms-input-placeholder {"}]
 14:35:38     INFO -  228 INFO Console message: [JavaScript Warning: "Unknown pseudo-class or pseudo-element 'input-placeholder'.  Ruleset ignored due to bad selector." {file: "chrome://loop/content/shared/css/conversation.css" line: 1055 column: 30 source: ".text-chat-box > form > input:input-placeholder {"}]
 14:35:38     INFO -  229 INFO Console message: [JavaScript Warning: "unreachable code after return statement" {file: "chrome://loop/content/shared/vendor/sdk.js" line: 11464 column: 4 source: "    eval(obj);
 14:35:38     INFO -  "}]
 14:35:38     INFO -  230 INFO Console message: [JavaScript Warning: "unreachable code after return statement" {file: "chrome://loop/content/shared/vendor/sdk.js" line: 11464 column: 4 source: "    eval(obj);
 14:35:38     INFO -  "}]
 14:35:38     INFO -  231 INFO Console message: [JavaScript Warning: "mutating the [[Prototype]] of an object will cause your code to run very slowly; instead create the object with the correct initial [[Prototype]] value using Object.create" {file: "chrome://loop/content/shared/vendor/sdk.js" line: 12164}]
 14:35:38     INFO -  232 INFO Console message: [JavaScript Warning: "unreachable code after return statement" {file: "chrome://loop/content/shared/vendor/sdk.js" line: 11464 column: 4 source: "    eval(obj);
 14:35:38     INFO -  "}]
 14:35:38     INFO -  233 INFO Console message: [JavaScript Warning: "unreachable code after return statement" {file: "chrome://loop/content/shared/vendor/sdk.js" line: 11464 column: 4 source: "    eval(obj);
 14:35:38     INFO -  "}]
 14:35:38     INFO -  234 INFO Console message: [JavaScript Warning: "mutating the [[Prototype]] of an object will cause your code to run very slowly; instead create the object with the correct initial [[Prototype]] value using Object.create" {file: "chrome://loop/content/shared/vendor/sdk.js" line: 12164}]
 14:35:38     INFO -  235 INFO Console message: [JavaScript Error: "The Components object is deprecated. It will soon be removed." {file: "chrome://loop/content/shared/js/utils.js" line: 9}]
 14:35:38     INFO -  236 INFO Console message: [JavaScript Error: "The Components object is deprecated. It will soon be removed." {file: "chrome://loop/content/shared/js/utils.js" line: 9}]
 14:35:38     INFO -  237 INFO Console message: [JavaScript Warning: "Empty string passed to getElementById()." {file: "chrome://global/content/bindings/remote-browser.xml" line: 222}]
 14:35:38     INFO -  238 INFO Console message: [JavaScript Warning: "Empty string passed to getElementById()." {file: "chrome://global/content/bindings/remote-browser.xml" line: 222}]
 14:35:38     INFO -  239 INFO Console message: [JavaScript Error: "TypeError: content is null" {file: "chrome://browser/content/social-content.js" line: 62}]
 14:35:38     INFO -  240 INFO TEST-UNEXPECTED-FAIL | browser/extensions/loop/chrome/test/mochitest/browser_mozLoop_chat.js | Condition didn't pass. -
 14:35:38     INFO -  Stack trace:
 14:35:38     INFO -  chrome://mochitests/content/browser/browser/extensions/loop/chrome/test/mochitest/head.js:waitForCondition/interval<:78
 14:35:38     INFO -  Not taking screenshot here: see the one that was previously logged
 14:35:38     INFO -  241 INFO TEST-UNEXPECTED-FAIL | browser/extensions/loop/chrome/test/mochitest/browser_mozLoop_chat.js | chat window should have been closed - false == true - JS frame :: chrome://mochitests/content/browser/browser/extensions/loop/chrome/test/mochitest/browser_mozLoop_chat.js :: test_mozLoop_hangupAllChatWindows :: line 38
 14:35:38     INFO -  Stack trace:
 14:35:38     INFO -      chrome://mochitests/content/browser/browser/extensions/loop/chrome/test/mochitest/browser_mozLoop_chat.js:test_mozLoop_hangupAllChatWindows:38
 14:35:38     INFO -      Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:937:23
 14:35:38     INFO -      this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:816:7
 14:35:38     INFO -      Promise*this.PromiseWalker.scheduleWalkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:747:11
 14:35:38     INFO -      this.PromiseWalker.schedulePromise@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:779:7
 14:35:38     INFO -      Promise.prototype.then@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:454:5
 14:35:38     INFO -      this.PromiseWalker.completePromise@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:705:7
 14:35:38     INFO -      TaskImpl_run@resource://gre/modules/Task.jsm:324:13
 14:35:38     INFO -      TaskImpl@resource://gre/modules/Task.jsm:280:3
 14:35:38     INFO -      createAsyncFunction/asyncFunction@resource://gre/modules/Task.jsm:254:14
 14:35:38     INFO -      Tester_execTest/<@chrome://mochikit/content/browser-test.js:808:21
 14:35:38     INFO -      TaskImpl_run@resource://gre/modules/Task.jsm:319:40
 14:35:38     INFO -      TaskImpl@resource://gre/modules/Task.jsm:280:3
 14:35:38     INFO -      createAsyncFunction/asyncFunction@resource://gre/modules/Task.jsm:254:14
 14:35:38     INFO -      Task_spawn@resource://gre/modules/Task.jsm:168:12
 14:35:38     INFO -      Tester_execTest@chrome://mochikit/content/browser-test.js:803:9
 14:35:38     INFO -      Tester.prototype.nextTest</<@chrome://mochikit/content/browser-test.js:723:7
 14:35:38     INFO -      SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<@chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:741:59
 14:35:38     INFO -  242 INFO Leaving test test_mozLoop_hangupAllChatWindows
Status: RESOLVED → REOPENED
Flags: needinfo?(mdeboer)
Resolution: FIXED → ---
Assignee: standard8 → mdeboer
Status: REOPENED → ASSIGNED
Flags: needinfo?(mdeboer)
Attachment #8717038 - Flags: review?(standard8)
Attachment #8717039 - Flags: review?(standard8) → review+
Comment on attachment 8717038 [details] [review]
[loop] mikedeboer:bug-1229195-dayum > mozilla:master

Looks good. r=Standard8
Attachment #8717038 - Flags: review?(standard8) → review+
Iteration: --- → 47.2 - Feb 22
Points: --- → 8
https://hg.mozilla.org/integration/fx-team/rev/0c51994fbafe65be49c253ed58390f1ef4de691c
Bug 1229195: explicitly link 'content' to the frame script global scope to prevent out-of-scope errors. r=Standard8
Blocks: 1245937
https://hg.mozilla.org/mozilla-central/rev/0c51994fbafe
Status: ASSIGNED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
This is still happening.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment on attachment 8717039 [details] [diff] [review]
Patch 1: explicitly link 'content' to the frame script global scope

Approval Request Comment
[Feature/regressing bug #]: Bug 1154277
[User impact if declined]: Patches in bug 1154277 increased the orange factor of this intermittent. This patch reduces that number, albeit not eliminating entirely.
[Describe test coverage new/current, TreeHerder]: Landed on m-c.
[Risks and why]: minor.
[String/UUID change made/needed]: n/a.
Attachment #8717039 - Flags: approval-mozilla-aurora?
(In reply to Wes Kocher (:KWierso) from comment #23)
> This is still happening.

This probably _does_ need the Loop/ Hello add-on update to improve the situation.
Comment on attachment 8717039 [details] [diff] [review]
Patch 1: explicitly link 'content' to the frame script global scope

Improve tests, taking it.
Attachment #8717039 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8717039 [details] [diff] [review]
Patch 1: explicitly link 'content' to the frame script global scope

[Triage Comment]
Needed for beta too. Should be in 45 beta 6.
Attachment #8717039 - Flags: approval-mozilla-beta+
Flags: needinfo?(mdeboer)
Status: REOPENED → ASSIGNED
Comment on attachment 8720760 [details] [review]
[loop] mikedeboer:bug-1229195-waitLonger > mozilla:master

What I've been able to gather from the logs so far is:

1) `content is null` errors in social-content.js is still the most common error, especially on Linux X64. This was not occurring in the patch I pushed to try before last, which returned early with a guard. I pushed a similar patch to try just moments ago, so I can reconfirm.
2) Beta does not have the Map.prototype#clear removal, so there are JS errors occurring in MozLoopService.jsm. This likely doesn't cause the test to fail, but it's certainly not helping things.
3) I other logs I just see the regular flow of messages, but the `promiseWaitForCondition` just times out. So I'd like to up the amount of time we wait in a test to at least 20 seconds. Right now it's 10 seconds. I could live with a minute too.
Attachment #8720760 - Flags: review?(mdeboer)
Target Milestone: mozilla47 → mozilla45
Attachment #8720760 - Flags: review?(mdeboer) → review?(standard8)
Comment on attachment 8720760 [details] [review]
[loop] mikedeboer:bug-1229195-waitLonger > mozilla:master

r=Standard8 with the comment addition requested in the bug.
Attachment #8720760 - Flags: review?(standard8) → review+
Blocks: 1248604
Rank: 15
Priority: -- → P1
Whiteboard: [btpp-fix-now]
(In reply to Mike de Boer [:mikedeboer] from comment #41)
> Created attachment 8720840 [details] [diff] [review]
> Patch 2: ignore messages when content is null

Why is content even null here? In what circumstances? And isn't this then going to catch out any other message listener in the same script?
Flags: needinfo?(mdeboer)
(In reply to :Gijs Kruitbosch from comment #43)
> Why is content even null here?

If I knew, I wouldn't have put the guard here but attempted a proper fix. ;-)

> In what circumstances?

This is happening when a message arrives in the script when the browser element is being destroyed by a) the chatbox is removed from the DOM or b) the detached dialog window is closed.
What I think is happening is that the listeners have a longer lifetime than the `content` object has, which is evidently already GC'd whilst the listeners are alive and kicking.
In my try pushes, I've only seen messages being skipped over of the type 'Social:CustomEvent' and event type 'Loop:HangupNow'. This doesn't affect the unit tests, as the windows are closed properly after all.

> And isn't this then
> going to catch out any other message listener in the same script?

Well in this case no, because there's no point for _any_ of the listeners to be invoked when `content` is null.
Flags: needinfo?(mdeboer)
(In reply to Mike de Boer [:mikedeboer] from comment #45)
> (In reply to :Gijs Kruitbosch from comment #43)
> > And isn't this then
> > going to catch out any other message listener in the same script?
> 
> Well in this case no, because there's no point for _any_ of the listeners to
> be invoked when `content` is null.

Yes, I guess what I meant was: whenever we set content to null, it feels like we should remove all the listeners that were attached from within that scope and/or stop sending messages to them.
Comment on attachment 8720840 [details] [diff] [review]
Patch 2: ignore messages when content is null

Review of attachment 8720840 [details] [diff] [review]:
-----------------------------------------------------------------

You might want to consider removing the listeners as well as per Gijs' comment.
Attachment #8720840 - Flags: review?(standard8) → review+
(In reply to Mark Banner (:standard8) from comment #47)
> Comment on attachment 8720840 [details] [diff] [review]
> Patch 2: ignore messages when content is null
> 
> Review of attachment 8720840 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> You might want to consider removing the listeners as well as per Gijs'
> comment.

To try to be even more explicit: I think whatever is GC'ing content / clearing things up here should be responsible for killing the listeners / messages. So we should find out what it is and fix it "properly" or we'll just run into this next time someone does whatever Hello is doing here.
(In reply to :Gijs Kruitbosch from comment #48)
> To try to be even more explicit: I think whatever is GC'ing content /
> clearing things up here should be responsible for killing the listeners /
> messages. So we should find out what it is and fix it "properly" or we'll
> just run into this next time someone does whatever Hello is doing here.

I agree. I'll file another bug to investigate this further. Landing this fix for a frequently failing test is good for tree health.
Blocks: 1250871
https://hg.mozilla.org/mozilla-central/rev/d785fddce4c1
Status: ASSIGNED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
OrangeFactor confirms that this is still hitting regularly since comment 51.
Status: RESOLVED → REOPENED
Flags: needinfo?(mdeboer)
Resolution: FIXED → ---
Target Milestone: mozilla45 → ---
Next Fx Hello update will have a `waitForCondition` update to allow waiting a longer amount of time.
Flags: needinfo?(mdeboer)
FTR, the additional fix landed in fx-team yesterday, and mozilla-central today.
Whiteboard: [btpp-fix-now] → [btpp-fix-now][btpp-follow-up-2016-03-10]
No longer blocks: 1248604
Rank: 15 → 20
Priority: P1 → P2
Given the ongoing high failure rate, I think it's disabling time (especially since it appears the the priority on fixing it was just decreased).
Flags: needinfo?(mdeboer)
Nah, I'm going to try and fix it this week.
Flags: needinfo?(mdeboer)
Well, if this isn't worth a facepalm, I don't know what is. Oboy.
Attachment #8730648 - Flags: review?(standard8)
Try is looking _very_ green.
Comment on attachment 8730648 [details] [diff] [review]
Patch 3: resolve the promise when the chat window has actually finished loading, not right away

Review of attachment 8730648 [details] [diff] [review]:
-----------------------------------------------------------------

r+ once you've improved the comments.

::: browser/extensions/loop/chrome/content/modules/MozLoopService.jsm
@@ +1117,5 @@
>          // away to circumvent glitches.
>          chatboxInstance.setAttribute("customSize", "loopDefault");
>          chatboxInstance.parentNode.setAttribute("customSize", "loopDefault");
>          Chat.loadButtonSet(chatboxInstance, "minimize,swap," + kChatboxHangupButton.id);
> +      } else {

As discussed, it'd be good to move the unit test comment to this section, and make it a bit clearer as to what it is relating to.
Attachment #8730648 - Flags: review?(standard8) → review+
https://hg.mozilla.org/integration/fx-team/rev/c1516ee3efd8a0329fccf93ba0398c69ed296ea4
Bug 1229195: resolve the promise when the chat window has actually finished loading, not right away. r=Standard8
Status: REOPENED → ASSIGNED
When you ready, please request uplift of this to whatever branches won't pick up the fix automatically via an upstream update (I'm looking at you, esr45).
Flags: needinfo?(mdeboer)
https://hg.mozilla.org/mozilla-central/rev/c1516ee3efd8
Status: ASSIGNED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Comment on attachment 8730648 [details] [diff] [review]
Patch 3: resolve the promise when the chat window has actually finished loading, not right away

[Approval Request Comment]
User impact if declined: This patch is required to land to fix a frequent intermittent failure in the browser_mozLoop_chat.js browser-chrome mochitest.
Fix Landed on Version: 48 (current m-c)
Risk to taking this patch (and alternatives if risky): minor, please approve m-a and m-b first if that's more convenient.
String or UUID changes made by this patch: n/a.
Flags: needinfo?(mdeboer)
Attachment #8730648 - Flags: approval-mozilla-esr45?
Attachment #8730648 - Flags: approval-mozilla-beta?
Attachment #8730648 - Flags: approval-mozilla-aurora?
Comment on attachment 8730648 [details] [diff] [review]
Patch 3: resolve the promise when the chat window has actually finished loading, not right away

One-line fix for an intermittent failure, Aurora47+, Beta46+
Attachment #8730648 - Flags: approval-mozilla-beta?
Attachment #8730648 - Flags: approval-mozilla-beta+
Attachment #8730648 - Flags: approval-mozilla-aurora?
Attachment #8730648 - Flags: approval-mozilla-aurora+
Comment on attachment 8730648 [details] [diff] [review]
Patch 3: resolve the promise when the chat window has actually finished loading, not right away


Simplify the life of sheriff, taking it.
Should be in 45.1.0
Attachment #8730648 - Flags: approval-mozilla-esr45? → approval-mozilla-esr45+
This doesn't apply cleanly to esr45, can we see what's missing there?
Flags: needinfo?(mdeboer)
This can wait (and hello is disabled in hello)
Flags: needinfo?(sledru)
Since Hello is disabled on ESR, no need to push it up there.
Flags: needinfo?(mdeboer)
so mike can we won't fix this then for esr45 ?
Flags: needinfo?(mdeboer)
Aye!
Flags: needinfo?(mdeboer)
You need to log in before you can comment on or make changes to this bug.