Closed Bug 1468890 Opened 6 years ago Closed 5 years ago

Intermittent browser/components/customizableui/test/browser_967000_button_charEncoding.js | Test timed out -

Categories

(Firefox :: Toolbars and Customization, defect, P1)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- wontfix
firefox69 --- unaffected
firefox70 --- wontfix
firefox71 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: dao)

References

(Regression)

Details

(Keywords: intermittent-failure, regression, Whiteboard: fixed by bug 1582797)

Attachments

(1 file)

Filed by: dluca [at] mozilla.com https://treeherder.mozilla.org/logviewer.html#?job_id=183277550&repo=autoland https://queue.taskcluster.net/v1/task/SMxnq31xRK2Xk2dvND-rbA/runs/0/artifacts/public/logs/live_backing.log TEST-START | browser/components/customizableui/test/browser_967000_button_charEncoding.js 01:16:12 INFO - TEST-INFO | started process screenshot 01:16:12 INFO - TEST-INFO | screenshot: exit 0 01:16:12 INFO - Buffered messages logged at 01:15:27 01:16:12 INFO - 1053 INFO Entering test bound 01:16:12 INFO - 1054 INFO Check Character Encoding button functionality 01:16:12 INFO - 1055 INFO Waiting for overflow button to have non-0 size 01:16:12 INFO - Buffered messages finished 01:16:12 ERROR - 1056 INFO TEST-UNEXPECTED-FAIL | browser/components/customizableui/test/browser_967000_button_charEncoding.js | Test timed out - 01:16:12 INFO - GECKO(804) | MEMORY STAT | vsize 804MB | vsizeMaxContiguous 305MB | residentFast 267MB | heapAllocated 107MB 01:16:12 INFO - 1057 INFO TEST-OK | browser/components/customizableui/test/browser_967000_button_charEncoding.js | took 45062ms 01:16:12 INFO - 1058 INFO checking window state 01:16:12 INFO - 1059 INFO TEST-START | browser/components/customizableui/test/browser_967000_button_feeds.js 01:16:12 INFO - Not taking screenshot here: see the one that was previously logged 01:16:12 INFO - Buffered messages logged at 01:16:12 01:16:12 INFO - 1060 INFO Entering test bound 01:16:12 INFO - 1061 INFO Check Subscribe button functionality 01:16:12 INFO - 1062 INFO Waiting for overflow button to have non-0 size 01:16:12 INFO - Buffered messages finished 01:16:12 ERROR - 1063 INFO TEST-UNEXPECTED-FAIL | browser/components/customizableui/test/browser_967000_button_charEncoding.js | Uncaught exception received from previously timed out test - at chrome://mochitests/content/browser/browser/components/customizableui/test/browser_967000_button_charEncoding.js:22 - ReferenceError: ok is not defined 01:16:12 INFO - Stack trace: 01:16:12 INFO - @chrome://mochitests/content/browser/browser/components/customizableui/test/browser_967000_button_charEncoding.js:22:3 01:16:12 INFO - Async*Tester_execTest/<@chrome://mochikit/content/browser-test.js:1098:34 01:16:12 INFO - async*Tester_execTest@chrome://mochikit/content/browser-test.js:1089:16 01:16:12 INFO - nextTest/<@chrome://mochikit/content/browser-test.js:991:9 01:16:12 INFO - SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<@chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:795:59
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Priority: -- → P3

This bug failed 82 times in the last 7 days. Occurs on linux64 and linux64-shippable on opt build types.

Recent log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=262074533&repo=mozilla-central&lineNumber=3580

Flags: needinfo?(gijskruitbosch+bugs)
Pushed by rmaries@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/26898211539f Disabled browser_967000_button_charEncoding.js on Linux. r=jmaher
Keywords: leave-open
Whiteboard: [stockwell disable-recommended] → [stockwell disabled]

I wasn't made aware of this bug - I thought the plan was to clear priorities and needinfo triage owners when intermittents played up - before disabling, instead of after?

Anyway, to do something here, we need a regression window. We've been looking in bug 1498063. Unfortunately, it's not very clear what tripped this. Perhaps in this bug it's easier to find a regression window, given it's higher frequency?

Flags: needinfo?(rmaries)
Flags: needinfo?(jmaher)
Flags: needinfo?(gijskruitbosch+bugs)
See Also: → 1498063

Gijs, I've started some retriggers / backfill and I will update the bug when I will find a culprit.

Flags: needinfo?(rmaries)

This is particularly disturbing because the screenshots for failures do not show an overflow button, but then the ok not defined errors seem to suggest that we make it past the "show the overflow button and show its popup" awaits right after the test times out, which makes very little sense. That is, esp. on shippable linux, it really shouldn't take particularly long for these 2 things to happen (ie they shouldn't cause a timeout) and even if they did, the sequence of events doesn't match the screenshot so the whole thing is nonsensical.

So atm I don't know what's going on.

Priority: P3 → --
Flags: needinfo?(gijskruitbosch+bugs)
Regressed by: 1539651
Keywords: regression

:gijs, sometimes a failure is rapid in a few days and it hits our threshold. We are getting stricter on this as:

  1. new high frequency intermittents have a root cause in the last few days, backout
  2. hit our stockwell disabled threshold- disable and let developer know so we they can prioritize a fix and re-enable the test

We have set the threshold for disabling tests and the intention and documented path is we disable, in reality that wasn't the case. In the near future when we have robots toggling these manifests, there will be no judgement calls which avoids confusion like has happened when we didn't follow our guidelines as we stated them 2 years ago.

It is really easy to re-enable a test, if the test is that important when you are needinfo'd that it is disabled please fix it as a P1 and re-enable it. Otherwise a test that fails so frequently will not be finding regressions.

Flags: needinfo?(jmaher)

(In reply to Cosmin Sabou [:CosminS] from comment #60)

Hi, this bug together with bug 1498063 spiked in failure rate after Bug 1539651 landed here: https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=linux%2Cx64%2Cshippable%2Copt%2Cmochitests%2Ctest-linux64-shippable%2Fopt-mochitest-browser-chrome-e10s-2%2Cm%28bc2%29&tochange=000157acfe106a62c4c84fe98fdeb8d27ade95ee&fromchange=ce83fa75ae32c5a9c0aa8722a72cb6cd5ecceedf&selectedJob=262351314

https://hg.mozilla.org/integration/autoland/rev/bd7cd9d7b6ade7bf9fa8441a8686f19c36f2a7e5

Before these changes this failed about 1 or 2 times on 20 runs, after this it fails 4-5 times on 20 runs. Alexander, please take a look over these two bugs. Thank you.

Both of these failures are preceded by JS failure [1] and [2]
modules/UrlbarInput.jsm, line 130: TypeError: undefined has no properties

which supposedly means that somebody calls a textbox method [3] after the textbox was deleted.

There's no logs from the original failure left so it's hard to tell whether this is a cause of the spike or not, but I suppose it's plausible, at least this is a thing to start from. However knowing a little about the code, it's hard to say who the caller could it be. Is there a way to get a stack trace for this failure or are there ideas/pointers how to untwist the failure further?

[1] https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=261864639&repo=mozilla-central&lineNumber=3201
[2] https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=261455803&repo=mozilla-central&lineNumber=3593
[3] https://hg.mozilla.org/mozilla-central/file/eff8c62bdeb71f78dbf0e2324b31d094355cf3dc/browser/components/urlbar/UrlbarInput.jsm#l130

Flags: needinfo?(surkov.alexander)

(In reply to alexander :surkov (:asurkov) from comment #62)

Both of these failures are preceded by JS failure [1] and [2]
modules/UrlbarInput.jsm, line 130: TypeError: undefined has no properties

which supposedly means that somebody calls a textbox method [3] after the textbox was deleted.

There are loads of these failures in all the tests in those runs, it would kind of surprise me if it was actually related...

There's no logs from the original failure left so it's hard to tell whether this is a cause of the spike or not, but I suppose it's plausible, at least this is a thing to start from. However knowing a little about the code, it's hard to say who the caller could it be. Is there a way to get a stack trace for this failure or are there ideas/pointers how to untwist the failure further?

So as noted I suspect that won't help. It looks to me like something is interfering with the overflow button appearing and/or the popup showing. The first thing to figure out is which one of those 2, and the second would be figuring out why... in the past this has been due to stray focus/window opening/closing events that auto-close the popup, perhaps from previous tests?

Flags: needinfo?(gijskruitbosch+bugs)

:surkov, any updates here? I'm concerned that something has changed wrt popups opening without anyone knowing exactly what/why...

Flags: needinfo?(surkov.alexander)

(In reply to :Gijs (he/him) from comment #63)

So as noted I suspect that won't help. It looks to me like something is interfering with the overflow button appearing and/or the popup showing.
The first thing to figure out is which one of those 2, and the second would be figuring out why...

It hangs while waiting overflow button [1], I suppose the button never appears, i.e. addWidgetToArea fails? Having no way to reproduce locally, do you have suggestions how to investigate it?

in the past this has been due to stray focus/window opening/closing events that auto-close the popup, perhaps from previous tests?

How to check it? From logs, screenshots or?

Flags: needinfo?(surkov.alexander) → needinfo?(gijskruitbosch+bugs)

(In reply to alexander :surkov (:asurkov) from comment #66)

(In reply to :Gijs (he/him) from comment #63)

So as noted I suspect that won't help. It looks to me like something is interfering with the overflow button appearing and/or the popup showing.
The first thing to figure out is which one of those 2, and the second would be figuring out why...

It hangs while waiting overflow button [1],

Was there meant to be a link here?

I suppose the button never appears, i.e. addWidgetToArea fails?

This would be surprising. I don't think there's a reasonable way for it to fail and not throw an exception which would break the test in other ways. But you could verify by logging CustomizableUI.getWidgetIdsInArea(CustomizableUI.AREA_FIXED_OVERFLOW_PANEL) (array of strings) or perhaps just document.getElementById(CustomizableUI.AREA_FIXED_OVERFLOW_PANEL).children (array of dom nodes).

It's possible that somehow we fail to set the right attributes to trigger the overflow button showing using CSS, ie https://searchfox.org/mozilla-central/rev/7088fc958db5935eba24b413b1f16d6ab7bd13ea/browser/components/customizableui/content/panelUI.js#688-695 somehow fails to run or the relevant CSS for nonemptyoverflow fails to cause the button to be shown.

Having no way to reproduce locally, do you have suggestions how to investigate it?

in the past this has been due to stray focus/window opening/closing events that auto-close the popup, perhaps from previous tests?

How to check it? From logs, screenshots or?

I'd add logging to https://searchfox.org/mozilla-central/rev/7088fc958db5935eba24b413b1f16d6ab7bd13ea/layout/xul/nsXULPopupManager.cpp#879 and the equivalent methods that show the popup to figure out when/why the popup is/isn't shown/hidden. nsTraceRefCnt::WalkTheStack(stderr) can help you get C++ traces.

The screenshot point is interesting because comment 59 suggests that the linux shippable failures show no overflow button, but the mac failure screenshots from the continuing failures (after disabling on linux) do show this button (cf. for https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=263873004&repo=autoland&lineNumber=9025 screenshot https://taskcluster-artifacts.net/LVFlRxynQ3CfEGyRthbVjA/0/public/test_info/mozilla-test-fail-screenshot_HIsKtK.png ). So it's not clear to me if the cause of the same-looking failure is the same on the two builds.

It's possible this is all timing-related, but I don't think I understand the impact that bug 1539651 would have had there. Do tests that run before this one open/close the bookmarks menu button, or add/remove it from the toolbar?

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(surkov.alexander)
Priority: -- → P1

(In reply to :Gijs (he/him) from comment #67)

The screenshot point is interesting because comment 59 suggests that the linux shippable failures show no overflow button, but the mac failure screenshots from the continuing failures (after disabling on linux) do show this button (cf. for https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=263873004&repo=autoland&lineNumber=9025 screenshot https://taskcluster-artifacts.net/LVFlRxynQ3CfEGyRthbVjA/0/public/test_info/mozilla-test-fail-screenshot_HIsKtK.png ). So it's not clear to me if the cause of the same-looking failure is the same on the two builds.

I'm confused. If the overflow button is shown as os x screenshot suggests it [1], then how could it possibly have zero width [2]?

[1] https://taskcluster-artifacts.net/LVFlRxynQ3CfEGyRthbVjA/0/public/test_info/mozilla-test-fail-screenshot_HIsKtK.png
[2] https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=263873004&repo=autoland&lineNumber=9023-9025

It's possible this is all timing-related, but I don't think I understand the impact that bug 1539651 would have had there. Do tests that run before this one open/close the bookmarks menu button, or add/remove it from the toolbar?

It's a good question. The previous test is browser_963639_customizing_attribute_non_customizable_toolbar.js which shows toolbar, enter cutomization mode, leaves it and removes the toolbar. I don't see anything explicit there that operates on bookmarks menubutton there.

Btw, do you have a recipe how to trigger the failure on try server. For example, is there an option to start a specific test on try server and run it until it fails?

Flags: needinfo?(surkov.alexander)
Depends on: 1582797

I expect this to be fixed by bug 1582797.

Assignee: nobody → dao+bmo
Status: REOPENED → RESOLVED
Closed: 6 years ago5 years ago
Keywords: leave-open
Resolution: --- → FIXED
Whiteboard: [stockwell disabled] → fixed by bug 1582797
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: