Closed Bug 1291489 Opened 8 years ago Closed 8 years ago

Intermittent browser/base/content/test/general/browser_aboutCertError.js | Uncaught exception - TypeError: advancedButton is null

Categories

(Core :: Security: PSM, defect, P3)

defect

Tracking

()

RESOLVED DUPLICATE of bug 1272942

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [psm-intermittent])

Component: Security: UI → Security: PSM
Priority: -- → P3
Whiteboard: [psm-intermittent]
This is happening on linux and win platforms (e10s debug, opt, & pgo):

https://brasstacks.mozilla.com/orangefactor/index.html?display=Bug&bugid=1291489&startday=2016-08-08&endday=2016-08-15&tree=trunk

And even more frequently on Windows 7 VM debug e10s, in this case 5x in 50 runs:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e42b9cde2d23&selectedJob=25659287&filter-searchStr=bc7

Looks like the browser is crashing after loading the BAD_STS_CERT URL [1] in a new tab (?):

 13:41:21     INFO -  90 INFO Entering test bound checkAdvancedDetailsForHSTS
 13:41:21     INFO -  91 INFO Loading a bad STS cert page and verifying the advanced details section
 13:41:21     INFO -  92 INFO Waiting for DOMContentLoaded event
 13:41:21     INFO -  93 INFO Console message: [JavaScript Error: "badchain.include-subdomains.pinning.example.com:443 uses an invalid security certificate.
 13:41:21     INFO -  The certificate is only valid for include-subdomains.pinning.example.com
 13:41:21     INFO -  Error code: <a id="errorCode" title="SSL_ERROR_BAD_CERT_DOMAIN">SSL_ERROR_BAD_CERT_DOMAIN</a>
 13:41:21     INFO -  "]
 13:41:21     INFO -  94 INFO Console message: [JavaScript Error: "remote browser crashed while on about:blank
13:41:21 INFO - " {file: "chrome://mochikit/content/mochitest-e10s-utils.js" line: 8}] 

[1] https://badchain.include-subdomains.pinning.example.com/

:bgrins, as the author of the test, would you please have a chance to take a look?
Flags: needinfo?(bgrinstead)
Sorry, I don't know why it'd be crashing when loading that URL.  Is there a way we can narrow down a regression range for this intermittent starting?  I guess we could go through and do a bunch of retriggers on every push before the first instance of this failure until it doesn't fail anymore - is there a way to automate that?
Flags: needinfo?(bgrinstead)
I have tried to automate that, it is difficult- good idea to narrow it down- I think we could find the first instance and retrigger like crazy on linux/win7 20 times each for -10, +5 revisions to see where this started.

For the case of windows 7 VM, the tests are not run there by default (only on try)- we are trying to get the tests running there and this bug is occuring much more frequently.

:rwood, up for an attempt at bisecting/narrowing down the range?
(In reply to Joel Maher ( :jmaher) from comment #5) 
> :rwood, up for an attempt at bisecting/narrowing down the range?

Absolutely, I'm on it...
Update: In orange factor, it reported the first revision where this intermittent showed up was revision 73a57814a495b29244ef5377e73488a3f3fabb15 on win 8 on 2nd-Aug:

https://brasstacks.mozilla.com/orangefactor/index.html?display=Bug&bugid=1291489&entireHistory=true&tree=trunk

So I did a bunch of runs on try, on that suspect revision +-3 revisions. Did them on win 7 vm debug because this intermittent happens more frequently on that platform. However, the intermittent actually showed up in each of the revisions:

https://treeherder.mozilla.org/#/jobs?repo=try&author=rwood@mozilla.com&fromchange=b0ec72d7e5d7bcf81d2cb452c66304a93ddea2cb&tochange=dfca8e5df69f15247eee2cc2c0f96355f1921212

:jmaher, is this possibly because the patch was introduced but the first win8 build with the patch was done at a later date? How can I narrows this down on win 7 vm? Thanks.
Flags: needinfo?(jmaher)
great question- possibly this has always been problematic on win7 vm since we never ran it there.  It is possible that if this shows up 1 out of 100 runs, that we need to look up to 100 revisions prior to the first instance.

Maybe we could pick revisions from mozilla-central to narrow the day down since we mostly merge once/day into m-c.  we could do up to 7 days prior and if that isn't conclusive, then I would say this is an issue we need to debug and look at on win7-vm only.
Flags: needinfo?(jmaher)
something odd happened here yesterday, we went from 30 in the last week, to 40 in the last day.
I did a lot of retriggers:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=bc7%20e10s%20win%20x64&tochange=5aa02060f7c47a1d4c2c198c18ab7fc59771467b&fromchange=6a6829ccc2b43074b8f3542b3db3213224502eab

it looks like this is the culprit:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=53bbfa02d45d

:birtles, I see you are the author of this, can you weigh in on why we are getting this failure about 3x more frequently after your patches landed?
Blocks: 1286476
Flags: needinfo?(bbirtles)
Those patches should have no effect on animations that play forwards (and I don't see anything in the test or what it triggers that reverses animations). They also shouldn't have any effect the results of getElementById. I can only suggest that these patches aren't responsible but that there's some race going on here and perhaps these patches happened to tickle something that affected the timing of this test (although they really should have almost zero impact on timing).
Flags: needinfo?(bbirtles)
Were this going to be something susceptible to retriggering to find a cause, wouldn't you want to start retriggering on beta, given that it was filed from there?
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.