Closed
Bug 1369440
Opened 8 years ago
Closed 1 year ago
Intermittent toolkit/components/extensions/test/mochitest/test_ext_contentscript_about_blank.html | Test timed out.
Categories
(WebExtensions :: General, defect, P3)
Tracking
(firefox55 fixed, firefox120 fixed)
RESOLVED
FIXED
mozilla55
People
(Reporter: RyanVM, Assigned: robwu)
References
Details
(Keywords: intermittent-failure, Whiteboard: [stockwell disabled])
Attachments
(2 files)
Spiking again recently.
https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=103725627
Comment hidden (Intermittent Failures Robot) |
Comment 2•8 years ago
|
||
doing some retriggers here to narrow this down:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=android%20debug%2048&tochange=5600d485769bed18de1fc9e8ba8e3f7e66bc771b&fromchange=59fa62ee9d7537bf9dd5e75ef77a6561b396fd21
34 failures in a day is very high failure rate! While waiting for the results, I would like to get the dev team aware of this and figure out who will work on this. I will update the bug with information when the retriggers are done.
:andym- can you help find someone to look into this?
Flags: needinfo?(amckay)
Whiteboard: [stockwell needswork]
Comment 3•8 years ago
|
||
:bkelly- this appears to be related to your change from bug 1363829. I know you worked hard to address issues prior to deployment and were expecting a few random things to show up, here you can see the difference:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=android%20debug%2048&tochange=9bf2eafa8defa9b0d69bfe8c757ca69826543e9b&fromchange=83f3573ea26840208086aa5bd7d41aff37a0ddf1
Blocks: 1363829
Flags: needinfo?(amckay) → needinfo?(bkelly)
Comment 4•8 years ago
|
||
I'll take a look on Monday. I saw this one with an earlier version of my patches, but it went away after some timing tweaks. I guess I was incorrect about that.
Comment 5•8 years ago
|
||
Kris, it looks like you looked at this test before in bug 1312161. Do you have any ideas? I'm going to investigate on Monday, but I don't really know our extension internals or test framework. Thanks!
It looks like its not seeing the about:blank window:
[task 2017-06-02T19:59:21.356964Z] 19:59:21 INFO - 8 INFO SpawnTask.js | Entering test test_contentscript_about_blank
[task 2017-06-02T19:59:21.357017Z] 19:59:21 INFO - 9 INFO Extension loaded
[task 2017-06-02T19:59:21.357065Z] 19:59:21 INFO - Buffered messages logged at 19:54:30
[task 2017-06-02T19:59:21.357116Z] 19:59:21 INFO - 10 INFO script ran: all,about:srcdoc,about:srcdoc
[task 2017-06-02T19:59:21.357161Z] 19:59:21 INFO - Buffered messages logged at 19:54:31
[task 2017-06-02T19:59:21.357230Z] 19:59:21 INFO - 11 INFO script ran: all,top,http://example.com/tests/toolkit/components/extensions/test/mochitest/file_with_about_blank.html
[task 2017-06-02T19:59:21.357279Z] 19:59:21 INFO - Buffered messages finished
[task 2017-06-02T19:59:21.357359Z] 19:59:21 INFO - 12 INFO TEST-UNEXPECTED-FAIL | toolkit/components/extensions/test/mochitest/test_ext_contentscript_about_blank.html | Test timed out.
My patches changed the timing of setTimeout() slightly, but its unclear to me how that factors into web extensions or this test.
Flags: needinfo?(kmaglione+bmo)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•8 years ago
|
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Flags: needinfo?(bkelly)
Comment 8•8 years ago
|
||
Looks like the extension-process-scripts.js code just changed a lot thanks to bug 1368102. I wonder if this still triggers.
Comment 9•8 years ago
|
||
Doing some android debug M(48) retriggers here:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=275588f4d852d7dc183a9dcc70a311413dc7a063&selectedJob=104477895
Comment 10•8 years ago
|
||
Looks like there was at least one failure after that change landed:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=19acfff499d3ee44cb09ed76b43646c146d06f58
Push some debug:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c3302469562c5f090a658186cfca1a46aa9b2a3a
Comment 11•8 years ago
|
||
Ugh. Android mochitests split logcat output from the test output so its really hard to line things up. And these tests take ~90 minutes to run.
Comment 12•8 years ago
|
||
I see this in a failing log:
06-05 10:51:21.922 789 809 E GeckoConsole: [JavaScript Error: "TypeError: win is null" {file: "resource://gre/modules/ExtensionUtils.jsm" line: 143}]
06-05 10:51:22.412 789 809 E GeckoConsole: [JavaScript Error: "TypeError: win is null" {file: "resource://gre/modules/ExtensionUtils.jsm" line: 143}]
06-05 10:51:22.412 789 809 E GeckoConsole: [JavaScript Error: "TypeError: win is null" {file: "resource://gre/modules/ExtensionUtils.jsm" line: 143}]
Comment 13•8 years ago
|
||
Comment 14•8 years ago
|
||
The window is null in _winUtils(), but not from the getInnerWindowID() function:
06-05 14:18:20.388 786 801 I Gecko : ### ### ExtensionUtils.jsm getInnerWindowID([object Window])
06-05 14:18:22.579 786 801 I Gecko : ### ### ExtensionUtils.jsm _winUtils(null)
It must be coming from one of these call sites:
http://searchfox.org/mozilla-central/source/toolkit/components/extensions/ExtensionContent.jsm#237
http://searchfox.org/mozilla-central/source/toolkit/components/extensions/ExtensionContent.jsm#286
http://searchfox.org/mozilla-central/source/toolkit/components/extensions/ExtensionTabs.jsm#1172
http://searchfox.org/mozilla-central/source/toolkit/components/extensions/ext-browser-content.js#81
http://searchfox.org/mozilla-central/source/toolkit/components/extensions/ext-browser-content.js#125
Comment 15•8 years ago
|
||
Comment hidden (Intermittent Failures Robot) |
Comment 17•8 years ago
|
||
It seems its coming from the Script.inject() code in ExtensionContent.jsm:
06-05 19:33:49.955 797 817 I Gecko : ### ### ExtensionContent.jsm Script.inject([object Object]) window: null
06-05 19:33:49.965 797 817 I Gecko : ### ### ExtensionUtils.jsm _winUtils(null)
Comment 18•8 years ago
|
||
It seems inject() already has a short-circuit for `!window` here. Maybe we can just do that a bit earlier.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7a4bc7f575123e63465d3b0b986a6d4905b01cf6
Comment 19•8 years ago
|
||
The short circuit prevents the exceptions, but the timeout still happens. I guess that makes sense if its preventing the extension from being injected into the page.
I could really use some help from the extension team to figure out what is going on.
Comment 20•8 years ago
|
||
Comment 21•8 years ago
|
||
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #17)
> It seems its coming from the Script.inject() code in ExtensionContent.jsm:
>
> 06-05 19:33:49.955 797 817 I Gecko : ### ### ExtensionContent.jsm
> Script.inject([object Object]) window: null
> 06-05 19:33:49.965 797 817 I Gecko : ### ### ExtensionUtils.jsm
> _winUtils(null)
That usually indicates a timing issue. The injection happens asynchronously, and we null out the contentWindow property if the window navigates or is destroyed after the context is created.
Unfortunately, the logic for injecting things into about:blank is pretty hairy, since about:blank documents don't behave like most other documents. For a start, they can be generated in two completely different ways, both of which are different from the way we generate normal documents, which means that things like document-element-inserted observers don't work. about:blank documents also tend to be created early for new docShells, and then reused to load the actual initial document, which means that knowing whether a new document is *actually* an about:blank document is now always easy.
Updated•8 years ago
|
Flags: needinfo?(kmaglione+bmo)
Comment 22•8 years ago
|
||
Can you point me to the code that calls inject() asynchronous for about:blank?
Comment hidden (Intermittent Failures Robot) |
Comment 24•8 years ago
|
||
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #22)
> Can you point me to the code that calls inject() asynchronous for
> about:blank?
All of the asynchrony is at:
http://searchfox.org/mozilla-central/rev/20963d7269b1b14d455f47bc0260d0653015bf84/toolkit/components/extensions/ExtensionContent.jsm#255-265,325-329
And it's triggered from:
http://searchfox.org/mozilla-central/rev/20963d7269b1b14d455f47bc0260d0653015bf84/toolkit/components/extensions/ExtensionPolicyService.cpp#264-289
Comment 25•8 years ago
|
||
(In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment #24)
> (In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #22)
> > Can you point me to the code that calls inject() asynchronous for
> > about:blank?
>
> All of the asynchrony is at:
>
> http://searchfox.org/mozilla-central/rev/
> 20963d7269b1b14d455f47bc0260d0653015bf84/toolkit/components/extensions/
> ExtensionContent.jsm#255-265,325-329
Does it make sense that:
let context = this.extension.getContext(window);
Can return a context with null `context.contentWindow`? Could we fix the contentWindow property there if we see its currently null?
Flags: needinfo?(kmaglione+bmo)
Comment 26•8 years ago
|
||
Also, since there are known issues with the about:blank handling, would you be ok if I disabled this test on android debug? The android emulator is exceptionally slow compared to real devices (even low end ones).
Comment 27•8 years ago
|
||
Comment 28•8 years ago
|
||
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #25)
> Does it make sense that:
>
> let context = this.extension.getContext(window);
>
> Can return a context with null `context.contentWindow`?
It can't return a context with a null contentWindow, but the contentWindow property can become null if the window navigates after the context is created, which is what's happening here.
> Could we fix the contentWindow property there if we see its currently null?
No, the property is null intentionally, because when the inner window that the
context was created for is not current, there's no way to access it from
JavaScript, and accidentally accessing the incorrect inner window instead
would be a major security issue.
Flags: needinfo?(kmaglione+bmo)
Comment 29•8 years ago
|
||
(In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment #28)
> (In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #25)
> > Does it make sense that:
> >
> > let context = this.extension.getContext(window);
> >
> > Can return a context with null `context.contentWindow`?
>
> It can't return a context with a null contentWindow, but the contentWindow
> property can become null if the window navigates after the context is
> created, which is what's happening here.
I don't understand how that could be happening in this test.
Comment 30•8 years ago
|
||
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #29)
> (In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment
> #28)
> > (In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #25)
> > > Does it make sense that:
> > >
> > > let context = this.extension.getContext(window);
> > >
> > > Can return a context with null `context.contentWindow`?
> >
> > It can't return a context with a null contentWindow, but the contentWindow
> > property can become null if the window navigates after the context is
> > created, which is what's happening here.
>
> I don't understand how that could be happening in this test.
The context is created. We call promiseDocumentReady or promiseDocumentLoaded and await the result. The page navigates, fires the DOMContentLoaded and load events, which causes the load promises to resolve, and then immediately fires pagehide, which causes the contentWindow property to become null. The promise resolution handler triggers the async injectInto function to resume, with the contentWindow property now null, which then calls inject().
We should probably null check at that point, and do something sensible rather than calling inject(), but I don't think that would fix the failing test.
Comment 31•8 years ago
|
||
This test does not navigate the page after the initial load AFAICT. So I don't understand where that navigate comes from.
Comment 32•8 years ago
|
||
Comment 33•8 years ago
|
||
Most likely it's a race from the initial about:blank document being replaced with something else.
Comment 34•8 years ago
|
||
The about:blank is a plain old iframe in the loaded document:
https://dxr.mozilla.org/mozilla-central/source/toolkit/components/extensions/test/mochitest/file_with_about_blank.html
I don't think that should be getting a replacement. But even if it did, shouldn't the test pick of the new window that replaces it and then pass?
Comment 35•8 years ago
|
||
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #34)
> The about:blank is a plain old iframe in the loaded document:
>
> https://dxr.mozilla.org/mozilla-central/source/toolkit/components/extensions/
> test/mochitest/file_with_about_blank.html
>
> I don't think that should be getting a replacement.
In a lot of circumstances, it should, if we're forced to create an about:blank
content viewer before the explicit about:blank load happens.
> But even if it did, shouldn't the test pick of the new window that replaces
> it and then pass?
If the new window were anything other than about:blank, yes, since we use
document-element-inserted observers to detect most loads. But about:blank
loads don't trigger document-element-inserted, and content-document-global-created
is not triggered when the original about:blank document is reused.
Comment hidden (Intermittent Failures Robot) |
Comment 37•8 years ago
|
||
It looks like in the failure cases we are getting some additional pagehide events which are clearing the contentWindow properties on the context.
At this point I think I've gone as far as I have time. I'm going to disable on android debug for now. If the web extension team sorts out the races in play here they can re-enable.
Does that sound reasonable to everyone?
Comment 38•8 years ago
|
||
Yes, that sounds fine to me.
Comment 39•8 years ago
|
||
This disables the test on android. I included opt since brasstacks suggests it fails there as well at a much lower rate.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1ac81f3ceb87c2cb94528f96375b86564e8b1edf
Attachment #8875881 -
Flags: review?(kmaglione+bmo)
Updated•8 years ago
|
Attachment #8875881 -
Flags: review?(kmaglione+bmo) → review+
Comment hidden (Intermittent Failures Robot) |
Comment 41•8 years ago
|
||
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/40a8be84033f
Disable test_ext_contentscript_about_blank.html on android due to intermittent failures. r=kmag
Comment 42•8 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
status-firefox55:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Updated•8 years ago
|
Whiteboard: [stockwell needswork] → [stockwell disabled]
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 45•7 years ago
|
||
This is still failed on Android, so I'm going to reopen it. It looks like there was a lot of investigation and work done on this already, so maybe someone can pick up where that left off. It's not a high priority, but we shouldn't just ignore the fact that it's disabled.
Assignee: bkelly → nobody
Status: RESOLVED → REOPENED
Component: WebExtensions: Untriaged → WebExtensions: General
Priority: -- → P3
Resolution: FIXED → ---
Comment 46•7 years ago
|
||
This was disabled on android in comment 41. When was it re-enabled? Perhaps it should block that bug.
Updated•7 years ago
|
Product: Toolkit → WebExtensions
Updated•4 years ago
|
Severity: normal → S4
OS: Unspecified → Android
Comment hidden (Intermittent Failures Robot) |
Comment 48•1 years ago
|
||
This test was disabled because it was failing intermittently on Fennec, but I wasn't able to hit a failure when running it locally on GeckoView.
Push to try:
Assignee | ||
Comment 49•1 year ago
|
||
test_ext_contentscript_about_blank.html had multiple skip-ifs:
-
Android - no longer needed to be skipped.
-
condprof - probably because another tab was still open. Fixed by
making the match pattern more specific (to include the file). -
http2 / http3 - fixed by making the match pattern also match https (by
using the wildcard scheme). Note that http2/http3 requests are
automatically upgraded from http://mochi.test:8888 to https://mochi.test
with --use-http3-server.
Updated•1 year ago
|
Assignee: nobody → rob
Comment 50•1 year ago
|
||
Pushed by rob@robwu.nl:
https://hg.mozilla.org/integration/autoland/rev/48f113784f3d
Fix and remove skip-if from test_ext_contentscript_about_blank.html r=rpl
Comment 51•1 year ago
|
||
bugherder |
Status: REOPENED → RESOLVED
Closed: 8 years ago → 1 year ago
status-firefox120:
--- → fixed
Resolution: --- → FIXED
Comment hidden (Intermittent Failures Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•