Intermittent browser/base/content/test/general/browser_aboutCertError.js | Uncaught exception - TypeError: advancedButton is null

RESOLVED DUPLICATE of bug 1272942

Status

()

Core
Security: PSM
P3
normal
RESOLVED DUPLICATE of bug 1272942
2 years ago
2 years ago

People

(Reporter: Treeherder Bug Filer, Unassigned)

Tracking

({intermittent-failure})

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [psm-intermittent])

Component: Security: UI → Security: PSM
Priority: -- → P3
Whiteboard: [psm-intermittent]

Comment 1

2 years ago
22 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-beta: 6
* try: 5
* fx-team: 5
* autoland: 4
* mozilla-inbound: 2

Platform breakdown:
* windows8-64: 12
* windows7-32-vm: 5
* windows7-32: 4
* linux64: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1291489&startday=2016-08-01&endday=2016-08-07&tree=all

Comment 2

2 years ago
30 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 19
* fx-team: 5
* autoland: 5
* mozilla-beta: 1

Platform breakdown:
* linux64: 14
* windows8-64: 8
* linux32: 5
* windows7-32: 3

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1291489&startday=2016-08-08&endday=2016-08-14&tree=all

Comment 3

2 years ago
This is happening on linux and win platforms (e10s debug, opt, & pgo):

https://brasstacks.mozilla.com/orangefactor/index.html?display=Bug&bugid=1291489&startday=2016-08-08&endday=2016-08-15&tree=trunk

And even more frequently on Windows 7 VM debug e10s, in this case 5x in 50 runs:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e42b9cde2d23&selectedJob=25659287&filter-searchStr=bc7

Looks like the browser is crashing after loading the BAD_STS_CERT URL [1] in a new tab (?):

 13:41:21     INFO -  90 INFO Entering test bound checkAdvancedDetailsForHSTS
 13:41:21     INFO -  91 INFO Loading a bad STS cert page and verifying the advanced details section
 13:41:21     INFO -  92 INFO Waiting for DOMContentLoaded event
 13:41:21     INFO -  93 INFO Console message: [JavaScript Error: "badchain.include-subdomains.pinning.example.com:443 uses an invalid security certificate.
 13:41:21     INFO -  The certificate is only valid for include-subdomains.pinning.example.com
 13:41:21     INFO -  Error code: <a id="errorCode" title="SSL_ERROR_BAD_CERT_DOMAIN">SSL_ERROR_BAD_CERT_DOMAIN</a>
 13:41:21     INFO -  "]
 13:41:21     INFO -  94 INFO Console message: [JavaScript Error: "remote browser crashed while on about:blank
13:41:21 INFO - " {file: "chrome://mochikit/content/mochitest-e10s-utils.js" line: 8}] 

[1] https://badchain.include-subdomains.pinning.example.com/

:bgrins, as the author of the test, would you please have a chance to take a look?
Flags: needinfo?(bgrinstead)
Sorry, I don't know why it'd be crashing when loading that URL.  Is there a way we can narrow down a regression range for this intermittent starting?  I guess we could go through and do a bunch of retriggers on every push before the first instance of this failure until it doesn't fail anymore - is there a way to automate that?
Flags: needinfo?(bgrinstead)
I have tried to automate that, it is difficult- good idea to narrow it down- I think we could find the first instance and retrigger like crazy on linux/win7 20 times each for -10, +5 revisions to see where this started.

For the case of windows 7 VM, the tests are not run there by default (only on try)- we are trying to get the tests running there and this bug is occuring much more frequently.

:rwood, up for an attempt at bisecting/narrowing down the range?

Comment 6

2 years ago
(In reply to Joel Maher ( :jmaher) from comment #5) 
> :rwood, up for an attempt at bisecting/narrowing down the range?

Absolutely, I'm on it...

Comment 7

2 years ago
Update: In orange factor, it reported the first revision where this intermittent showed up was revision 73a57814a495b29244ef5377e73488a3f3fabb15 on win 8 on 2nd-Aug:

https://brasstacks.mozilla.com/orangefactor/index.html?display=Bug&bugid=1291489&entireHistory=true&tree=trunk

So I did a bunch of runs on try, on that suspect revision +-3 revisions. Did them on win 7 vm debug because this intermittent happens more frequently on that platform. However, the intermittent actually showed up in each of the revisions:

https://treeherder.mozilla.org/#/jobs?repo=try&author=rwood@mozilla.com&fromchange=b0ec72d7e5d7bcf81d2cb452c66304a93ddea2cb&tochange=dfca8e5df69f15247eee2cc2c0f96355f1921212

:jmaher, is this possibly because the patch was introduced but the first win8 build with the patch was done at a later date? How can I narrows this down on win 7 vm? Thanks.
Flags: needinfo?(jmaher)
great question- possibly this has always been problematic on win7 vm since we never ran it there.  It is possible that if this shows up 1 out of 100 runs, that we need to look up to 100 revisions prior to the first instance.

Maybe we could pick revisions from mozilla-central to narrow the day down since we mostly merge once/day into m-c.  we could do up to 7 days prior and if that isn't conclusive, then I would say this is an issue we need to debug and look at on win7-vm only.
Flags: needinfo?(jmaher)

Comment 9

2 years ago
40 automation job failures were associated with this bug yesterday.

Repository breakdown:
* mozilla-inbound: 15
* autoland: 14
* fx-team: 6
* mozilla-central: 5

Platform breakdown:
* windows8-64: 20
* linux64: 15
* windows7-32: 5

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1291489&startday=2016-08-17&endday=2016-08-17&tree=all
something odd happened here yesterday, we went from 30 in the last week, to 40 in the last day.

Comment 13

2 years ago
43 automation job failures were associated with this bug yesterday.

Repository breakdown:
* mozilla-inbound: 14
* autoland: 11
* fx-team: 7
* mozilla-aurora: 5
* mozilla-beta: 4
* try: 1
* mozilla-central: 1

Platform breakdown:
* windows8-64: 21
* linux64: 19
* windows7-32: 3

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1291489&startday=2016-08-18&endday=2016-08-18&tree=all

Comment 14

2 years ago
51 automation job failures were associated with this bug yesterday.

Repository breakdown:
* mozilla-inbound: 24
* fx-team: 12
* mozilla-central: 7
* autoland: 6
* mozilla-beta: 1
* mozilla-aurora: 1

Platform breakdown:
* windows8-64: 29
* linux64: 17
* linux32: 3
* windows7-32: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1291489&startday=2016-08-19&endday=2016-08-19&tree=all
I did a lot of retriggers:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=bc7%20e10s%20win%20x64&tochange=5aa02060f7c47a1d4c2c198c18ab7fc59771467b&fromchange=6a6829ccc2b43074b8f3542b3db3213224502eab

it looks like this is the culprit:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=53bbfa02d45d

:birtles, I see you are the author of this, can you weigh in on why we are getting this failure about 3x more frequently after your patches landed?
Blocks: 1286476
Flags: needinfo?(bbirtles)
Those patches should have no effect on animations that play forwards (and I don't see anything in the test or what it triggers that reverses animations). They also shouldn't have any effect the results of getElementById. I can only suggest that these patches aren't responsible but that there's some race going on here and perhaps these patches happened to tickle something that affected the timing of this test (although they really should have almost zero impact on timing).
Flags: needinfo?(bbirtles)

Comment 17

2 years ago
200 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 84
* autoland: 43
* fx-team: 33
* mozilla-central: 16
* mozilla-beta: 10
* mozilla-aurora: 9
* try: 3
* ash: 2

Platform breakdown:
* windows8-64: 95
* linux64: 81
* windows7-32: 19
* linux32: 3
* windows7-32-vm: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1291489&startday=2016-08-15&endday=2016-08-21&tree=all
Were this going to be something susceptible to retriggering to find a cause, wouldn't you want to start retriggering on beta, given that it was filed from there?
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1272942

Comment 19

2 years ago
29 automation job failures were associated with this bug yesterday.

Repository breakdown:
* mozilla-inbound: 11
* autoland: 7
* mozilla-aurora: 6
* mozilla-central: 4
* try: 1

Platform breakdown:
* linux64: 16
* windows8-64: 10
* windows7-32: 3

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1291489&startday=2016-08-22&endday=2016-08-22&tree=all

Comment 20

2 years ago
31 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 11
* autoland: 7
* mozilla-aurora: 6
* mozilla-central: 4
* fx-team: 2
* try: 1

Platform breakdown:
* linux64: 17
* windows8-64: 11
* windows7-32: 3

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1291489&startday=2016-08-22&endday=2016-08-28&tree=all
You need to log in before you can comment on or make changes to this bug.