Intermittent toolkit/components/extensions/test/mochitest/test_ext_contentscript_about_blank.html | Test timed out.

REOPENED
Unassigned

Status

()

Toolkit
WebExtensions: General
P3
normal
REOPENED
6 months ago
21 days ago

People

(Reporter: RyanVM, Unassigned)

Tracking

({intermittent-failure})

unspecified
mozilla55
intermittent-failure
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox55 fixed)

Details

(Whiteboard: [stockwell disabled])

Attachments

(1 attachment)

(Reporter)

Description

6 months ago
Spiking again recently.

https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=103725627

Comment 1

6 months ago
34 failures in 155 pushes (0.219 failures/push) were associated with this bug yesterday.   

Repository breakdown:
* autoland: 18
* mozilla-inbound: 12
* mozilla-central: 3
* graphics: 1

Platform breakdown:
* android-4-3-armv7-api15: 34

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1369440&startday=2017-06-01&endday=2017-06-01&tree=all
doing some retriggers here to narrow this down:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=android%20debug%2048&tochange=5600d485769bed18de1fc9e8ba8e3f7e66bc771b&fromchange=59fa62ee9d7537bf9dd5e75ef77a6561b396fd21

34 failures in a day is very high failure rate!  While waiting for the results, I would like to get the dev team aware of this and figure out who will work on this.  I will update the bug with information when the retriggers are done.

:andym- can you help find someone to look into this?
Flags: needinfo?(amckay)
Whiteboard: [stockwell needswork]
:bkelly- this appears to be related to your change from bug 1363829.  I know you worked hard to address issues prior to deployment and were expecting a few random things to show up, here you can see the difference:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=android%20debug%2048&tochange=9bf2eafa8defa9b0d69bfe8c757ca69826543e9b&fromchange=83f3573ea26840208086aa5bd7d41aff37a0ddf1
Blocks: 1363829
Flags: needinfo?(amckay) → needinfo?(bkelly)

Comment 4

6 months ago
I'll take a look on Monday.  I saw this one with an earlier version of my patches, but it went away after some timing tweaks.  I guess I was incorrect about that.

Comment 5

6 months ago
Kris, it looks like you looked at this test before in bug 1312161.  Do you have any ideas?  I'm going to investigate on Monday, but I don't really know our extension internals or test framework.  Thanks!

It looks like its not seeing the about:blank window:

[task 2017-06-02T19:59:21.356964Z] 19:59:21     INFO -  8 INFO SpawnTask.js | Entering test test_contentscript_about_blank
[task 2017-06-02T19:59:21.357017Z] 19:59:21     INFO -  9 INFO Extension loaded
[task 2017-06-02T19:59:21.357065Z] 19:59:21     INFO -  Buffered messages logged at 19:54:30
[task 2017-06-02T19:59:21.357116Z] 19:59:21     INFO -  10 INFO script ran: all,about:srcdoc,about:srcdoc
[task 2017-06-02T19:59:21.357161Z] 19:59:21     INFO -  Buffered messages logged at 19:54:31
[task 2017-06-02T19:59:21.357230Z] 19:59:21     INFO -  11 INFO script ran: all,top,http://example.com/tests/toolkit/components/extensions/test/mochitest/file_with_about_blank.html
[task 2017-06-02T19:59:21.357279Z] 19:59:21     INFO -  Buffered messages finished
[task 2017-06-02T19:59:21.357359Z] 19:59:21     INFO -  12 INFO TEST-UNEXPECTED-FAIL | toolkit/components/extensions/test/mochitest/test_ext_contentscript_about_blank.html | Test timed out.

My patches changed the timing of setTimeout() slightly, but its unclear to me how that factors into web extensions or this test.
Flags: needinfo?(kmaglione+bmo)

Comment 6

6 months ago
58 failures in 149 pushes (0.389 failures/push) were associated with this bug yesterday.   

Repository breakdown:
* autoland: 33
* mozilla-inbound: 20
* mozilla-central: 4
* try: 1

Platform breakdown:
* android-4-3-armv7-api15: 56
* linux64-stylo: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1369440&startday=2017-06-02&endday=2017-06-02&tree=all

Comment 7

6 months ago
116 failures in 820 pushes (0.141 failures/push) were associated with this bug in the last 7 days. 

This is the #5 most frequent failure this week. 

** This failure happened more than 75 times this week! Resolving this bug is a very high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 1 week, the affected test(s) may be disabled. **  

Repository breakdown:
* autoland: 65
* mozilla-inbound: 41
* mozilla-central: 8
* try: 1
* graphics: 1

Platform breakdown:
* android-4-3-armv7-api15: 114
* linux64-stylo: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1369440&startday=2017-05-29&endday=2017-06-04&tree=all

Updated

6 months ago
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Flags: needinfo?(bkelly)

Comment 8

6 months ago
Looks like the extension-process-scripts.js code just changed a lot thanks to bug 1368102.  I wonder if this still triggers.

Comment 9

6 months ago
Doing some android debug M(48) retriggers here:

https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=275588f4d852d7dc183a9dcc70a311413dc7a063&selectedJob=104477895

Comment 10

6 months ago
Looks like there was at least one failure after that change landed:

https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=19acfff499d3ee44cb09ed76b43646c146d06f58

Push some debug:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c3302469562c5f090a658186cfca1a46aa9b2a3a

Comment 11

6 months ago
Ugh.  Android mochitests split logcat output from the test output so its really hard to line things up.  And these tests take ~90 minutes to run.

Comment 12

6 months ago
I see this in a failing log:

06-05 10:51:21.922   789   809 E GeckoConsole: [JavaScript Error: "TypeError: win is null" {file: "resource://gre/modules/ExtensionUtils.jsm" line: 143}]
06-05 10:51:22.412   789   809 E GeckoConsole: [JavaScript Error: "TypeError: win is null" {file: "resource://gre/modules/ExtensionUtils.jsm" line: 143}]
06-05 10:51:22.412   789   809 E GeckoConsole: [JavaScript Error: "TypeError: win is null" {file: "resource://gre/modules/ExtensionUtils.jsm" line: 143}]

Comment 13

6 months ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=dda228239cb51d5a28dfcc691bbea1819d34d5ab

Comment 14

6 months ago
The window is null in _winUtils(), but not from the getInnerWindowID() function:

06-05 14:18:20.388   786   801 I Gecko   : ### ### ExtensionUtils.jsm getInnerWindowID([object Window])
06-05 14:18:22.579   786   801 I Gecko   : ### ### ExtensionUtils.jsm _winUtils(null)

It must be coming from one of these call sites:

http://searchfox.org/mozilla-central/source/toolkit/components/extensions/ExtensionContent.jsm#237
http://searchfox.org/mozilla-central/source/toolkit/components/extensions/ExtensionContent.jsm#286
http://searchfox.org/mozilla-central/source/toolkit/components/extensions/ExtensionTabs.jsm#1172
http://searchfox.org/mozilla-central/source/toolkit/components/extensions/ext-browser-content.js#81
http://searchfox.org/mozilla-central/source/toolkit/components/extensions/ext-browser-content.js#125

Comment 15

6 months ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e051b2932f441c9c194d56858a903d368996cebf

Comment 16

6 months ago
63 failures in 148 pushes (0.426 failures/push) were associated with this bug yesterday.   

Repository breakdown:
* autoland: 29
* mozilla-inbound: 23
* mozilla-central: 10
* try: 1

Platform breakdown:
* android-4-3-armv7-api15: 63

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1369440&startday=2017-06-05&endday=2017-06-05&tree=all

Comment 17

6 months ago
It seems its coming from the Script.inject() code in ExtensionContent.jsm:

06-05 19:33:49.955   797   817 I Gecko   : ### ### ExtensionContent.jsm Script.inject([object Object]) window: null
06-05 19:33:49.965   797   817 I Gecko   : ### ### ExtensionUtils.jsm _winUtils(null)

Comment 18

6 months ago
It seems inject() already has a short-circuit for `!window` here.  Maybe we can just do that a bit earlier.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=7a4bc7f575123e63465d3b0b986a6d4905b01cf6

Comment 19

6 months ago
The short circuit prevents the exceptions, but the timeout still happens.  I guess that makes sense if its preventing the extension from being injected into the page.

I could really use some help from the extension team to figure out what is going on.

Comment 20

6 months ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c357fd7bfa48ba9a97f02a516cc545bcca811e62
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #17)
> It seems its coming from the Script.inject() code in ExtensionContent.jsm:
> 
> 06-05 19:33:49.955   797   817 I Gecko   : ### ### ExtensionContent.jsm
> Script.inject([object Object]) window: null
> 06-05 19:33:49.965   797   817 I Gecko   : ### ### ExtensionUtils.jsm
> _winUtils(null)

That usually indicates a timing issue. The injection happens asynchronously, and we null out the contentWindow property if the window navigates or is destroyed after the context is created.

Unfortunately, the logic for injecting things into about:blank is pretty hairy, since about:blank documents don't behave like most other documents. For a start, they can be generated in two completely different ways, both of which are different from the way we generate normal documents, which means that things like document-element-inserted observers don't work. about:blank documents also tend to be created early for new docShells, and then reused to load the actual initial document, which means that knowing whether a new document is *actually* an about:blank document is now always easy.
Flags: needinfo?(kmaglione+bmo)

Comment 22

6 months ago
Can you point me to the code that calls inject() asynchronous for about:blank?

Comment 23

6 months ago
33 failures in 132 pushes (0.25 failures/push) were associated with this bug yesterday.   

Repository breakdown:
* autoland: 15
* mozilla-inbound: 13
* mozilla-central: 3
* try: 2

Platform breakdown:
* android-4-3-armv7-api15: 33

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1369440&startday=2017-06-06&endday=2017-06-06&tree=all
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #22)
> Can you point me to the code that calls inject() asynchronous for
> about:blank?

All of the asynchrony is at:

http://searchfox.org/mozilla-central/rev/20963d7269b1b14d455f47bc0260d0653015bf84/toolkit/components/extensions/ExtensionContent.jsm#255-265,325-329

And it's triggered from:

http://searchfox.org/mozilla-central/rev/20963d7269b1b14d455f47bc0260d0653015bf84/toolkit/components/extensions/ExtensionPolicyService.cpp#264-289

Comment 25

6 months ago
(In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment #24)
> (In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #22)
> > Can you point me to the code that calls inject() asynchronous for
> > about:blank?
> 
> All of the asynchrony is at:
> 
> http://searchfox.org/mozilla-central/rev/
> 20963d7269b1b14d455f47bc0260d0653015bf84/toolkit/components/extensions/
> ExtensionContent.jsm#255-265,325-329

Does it make sense that:

    let context = this.extension.getContext(window);

Can return a context with null `context.contentWindow`?  Could we fix the contentWindow property there if we see its currently null?
Flags: needinfo?(kmaglione+bmo)

Comment 26

6 months ago
Also, since there are known issues with the about:blank handling, would you be ok if I disabled this test on android debug?  The android emulator is exceptionally slow compared to real devices (even low end ones).

Comment 27

6 months ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7116dd644c90321e575e6dd637e3da1a113bbdb3
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #25)
> Does it make sense that:
> 
>     let context = this.extension.getContext(window);
> 
> Can return a context with null `context.contentWindow`?

It can't return a context with a null contentWindow, but the contentWindow property can become null if the window navigates after the context is created, which is what's happening here.

> Could we fix the contentWindow property there if we see its currently null?

No, the property is null intentionally, because when the inner window that the
context was created for is not current, there's no way to access it from
JavaScript, and accidentally accessing the incorrect inner window instead
would be a major security issue.
Flags: needinfo?(kmaglione+bmo)

Comment 29

6 months ago
(In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment #28)
> (In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #25)
> > Does it make sense that:
> > 
> >     let context = this.extension.getContext(window);
> > 
> > Can return a context with null `context.contentWindow`?
> 
> It can't return a context with a null contentWindow, but the contentWindow
> property can become null if the window navigates after the context is
> created, which is what's happening here.

I don't understand how that could be happening in this test.
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #29)
> (In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment
> #28)
> > (In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #25)
> > > Does it make sense that:
> > > 
> > >     let context = this.extension.getContext(window);
> > > 
> > > Can return a context with null `context.contentWindow`?
> > 
> > It can't return a context with a null contentWindow, but the contentWindow
> > property can become null if the window navigates after the context is
> > created, which is what's happening here.
> 
> I don't understand how that could be happening in this test.

The context is created. We call promiseDocumentReady or promiseDocumentLoaded and await the result. The page navigates, fires the DOMContentLoaded and load events, which causes the load promises to resolve, and then immediately fires pagehide, which causes the contentWindow property to become null. The promise resolution handler triggers the async injectInto function to resume, with the contentWindow property now null, which then calls inject().

We should probably null check at that point, and do something sensible rather than calling inject(), but I don't think that would fix the failing test.

Comment 31

6 months ago
This test does not navigate the page after the initial load AFAICT.  So I don't understand where that navigate comes from.

Comment 32

6 months ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9047257f4c236981f97d70cc89523c77253fce8f
Most likely it's a race from the initial about:blank document being replaced with something else.

Comment 34

6 months ago
The about:blank is a plain old iframe in the loaded document:

https://dxr.mozilla.org/mozilla-central/source/toolkit/components/extensions/test/mochitest/file_with_about_blank.html

I don't think that should be getting a replacement.  But even if it did, shouldn't the test pick of the new window that replaces it and then pass?
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #34)
> The about:blank is a plain old iframe in the loaded document:
> 
> https://dxr.mozilla.org/mozilla-central/source/toolkit/components/extensions/
> test/mochitest/file_with_about_blank.html
> 
> I don't think that should be getting a replacement.

In a lot of circumstances, it should, if we're forced to create an about:blank
content viewer before the explicit about:blank load happens.

> But even if it did, shouldn't the test pick of the new window that replaces
> it and then pass?

If the new window were anything other than about:blank, yes, since we use
document-element-inserted observers to detect most loads. But about:blank
loads don't trigger document-element-inserted, and content-document-global-created
is not triggered when the original about:blank document is reused.

Comment 36

6 months ago
61 failures in 182 pushes (0.335 failures/push) were associated with this bug yesterday.   

Repository breakdown:
* autoland: 28
* mozilla-inbound: 27
* try: 3
* mozilla-central: 3

Platform breakdown:
* android-4-3-armv7-api15: 61

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1369440&startday=2017-06-07&endday=2017-06-07&tree=all

Comment 37

6 months ago
It looks like in the failure cases we are getting some additional pagehide events which are clearing the contentWindow properties on the context.

At this point I think I've gone as far as I have time.  I'm going to disable on android debug for now.  If the web extension team sorts out the races in play here they can re-enable.

Does that sound reasonable to everyone?
Yes, that sounds fine to me.

Comment 39

6 months ago
Created attachment 8875881 [details] [diff] [review]
Disable test_ext_contentscript_about_blank.html on android due to intermittent failures. r=kmag

This disables the test on android.  I included opt since brasstacks suggests it fails there as well at a much lower rate.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1ac81f3ceb87c2cb94528f96375b86564e8b1edf
Attachment #8875881 - Flags: review?(kmaglione+bmo)
Attachment #8875881 - Flags: review?(kmaglione+bmo) → review+

Comment 40

6 months ago
38 failures in 172 pushes (0.221 failures/push) were associated with this bug yesterday.   

Repository breakdown:
* autoland: 21
* mozilla-inbound: 15
* mozilla-central: 2

Platform breakdown:
* android-4-3-armv7-api15: 38

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1369440&startday=2017-06-08&endday=2017-06-08&tree=all

Comment 41

6 months ago
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/40a8be84033f
Disable test_ext_contentscript_about_blank.html on android due to intermittent failures. r=kmag

Comment 42

5 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/40a8be84033f
Status: ASSIGNED → RESOLVED
Last Resolved: 5 months ago
status-firefox55: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Whiteboard: [stockwell needswork] → [stockwell disabled]

Comment 43

5 months ago
15 failures in 153 pushes (0.098 failures/push) were associated with this bug yesterday.   

Repository breakdown:
* autoland: 13
* mozilla-central: 2

Platform breakdown:
* android-4-3-armv7-api15: 15

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1369440&startday=2017-06-09&endday=2017-06-09&tree=all

Comment 44

5 months ago
230 failures in 864 pushes (0.266 failures/push) were associated with this bug in the last 7 days. 

This is the #6 most frequent failure this week. 

** This failure happened more than 75 times this week! Resolving this bug is a very high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 1 week, the affected test(s) may be disabled. **  

Repository breakdown:
* autoland: 122
* mozilla-inbound: 81
* mozilla-central: 20
* try: 7

Platform breakdown:
* android-4-3-armv7-api15: 230

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1369440&startday=2017-06-05&endday=2017-06-11&tree=all
This is still failed on Android, so I'm going to reopen it. It looks like there was a lot of investigation and work done on this already, so maybe someone can pick up where that left off. It's not a high priority, but we shouldn't just ignore the fact that it's disabled.
Assignee: bkelly → nobody
Status: RESOLVED → REOPENED
Component: WebExtensions: Untriaged → WebExtensions: General
Priority: -- → P3
Resolution: FIXED → ---

Comment 46

21 days ago
This was disabled on android in comment 41.  When was it re-enabled?  Perhaps it should block that bug.
You need to log in before you can comment on or make changes to this bug.