Closed Bug 1303298 Opened 3 years ago Closed 3 years ago

Intermittent browser/base/content/test/general/browser_(mixed_content_cert_override|addCertException).js | Uncaught exception - TypeError: content.document.getElementById(...) is null

Categories

(Firefox :: Security, defect, P2)

defect

Tracking

()

RESOLVED FIXED
Firefox 54
Tracking Status
firefox50 --- wontfix
firefox51 --- wontfix
firefox52 --- fixed
firefox53 --- fixed
firefox54 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: johannh)

References

Details

(Keywords: intermittent-failure, Whiteboard: [fxprivacy][stockwell fixed])

Attachments

(1 file)

Whiteboard: [fxprivacy] [triage]
Priority: -- → P2
:johannh, this really became a problematic orange on the 22nd, it went from 50 failures the previous week to 79 failures last week.  

As this is on all branches and on osx opt e10s primarily, it does narrow it down nicely.

here is a clip from a log [1]:
14:45:28     INFO - TEST-START | browser/base/content/test/general/browser_mixed_content_cert_override.js
14:45:28     INFO - *************************
14:45:28     INFO - A coding exception was thrown and uncaught in a Task.
14:45:28     INFO - Full message: TypeError: content.document.getElementById(...) is null
14:45:28     INFO - Full stack: @chrome://mochikit/content/tests/BrowserTestUtils/content-task.js line 52 > eval:4:5
14:45:28     INFO - TaskImpl_run@resource://gre/modules/Task.jsm:319:42
14:45:28     INFO - TaskImpl@resource://gre/modules/Task.jsm:277:3
14:45:28     INFO - createAsyncFunction/asyncFunction@resource://gre/modules/Task.jsm:252:14
14:45:28     INFO - Task_spawn@resource://gre/modules/Task.jsm:166:12
14:45:28     INFO - @chrome://mochikit/content/tests/BrowserTestUtils/content-task.js:54:5
14:45:28     INFO - *************************
14:45:28     INFO - TEST-INFO | started process screencapture
14:45:28     INFO - TEST-INFO | screencapture: exit 0
14:45:28     INFO - Buffered messages logged at 14:45:28
14:45:28     INFO - Entering test bound 
14:45:28     INFO - Console message: [JavaScript Error: "self-signed.example.com:443 uses an invalid security certificate.
14:45:28     INFO - 
14:45:28     INFO - The certificate is not trusted because it is self-signed.
14:45:28     INFO - 
14:45:28     INFO - Error code: <a id="errorCode" title="SEC_ERROR_UNKNOWN_ISSUER">SEC_ERROR_UNKNOWN_ISSUER</a>
14:45:28     INFO - "]
14:45:28     INFO - Buffered messages finished
14:45:28     INFO - TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_mixed_content_cert_override.js | Uncaught exception - TypeError: content.document.getElementById(...) is null
14:45:28     INFO - Leaving test bound 
14:45:28     INFO - MEMORY STAT | vsize 3682MB | residentFast 634MB | heapAllocated 178MB
14:45:28     INFO - TEST-OK | browser/base/content/test/general/browser_mixed_content_cert_override.js | took 126ms

and the related screenshot is at [2].


[1] https://archive.mozilla.org/pub/firefox/tinderbox-builds/autoland-macosx64/1480282519/autoland_yosemite_r7_test-mochitest-e10s-browser-chrome-7-bm132-tests1-macosx-build482.txt.gz
[2] http://mozilla-releng-blobs.s3.amazonaws.com/blobs/autoland/sha512/e3264cd5681aaae32221ac5e2c8af881185411519fd7ac5f89fa7fc39dfd9c66ff67f800bc100d28ca9e02bc8a074c251268da9a50694044aafb44eb8c44618a

:johannh, can you get this bug assigned and get some traction on this bug?
Flags: needinfo?(jhofmann)
Yes, sorry for the delay, it's on my list for this week :)
Flags: needinfo?(jhofmann)
Whiteboard: [fxprivacy] [triage] → [fxprivacy]
So on OSX there seems to be an inexplainable delay until the call to certOverrideService.clearValidityOverride gets picked up.

I have no idea why this has increased recently, probably due to platform work or maybe because the tests are running faster and so the clearValidityOverride from browser_addCertExpection has less time to "brew".

We should talk to platform security about this, but since this intermittent has gotten quite frequent the little hack from this patch should also fix it for now.
Comment on attachment 8815331 [details]
Bug 1303298 - Use BrowserTestUtils.waitForErrorPage to wait for certerror page load.

https://reviewboard.mozilla.org/r/96302/#review96504

::: browser/base/content/test/general/head.js:1028
(Diff revision 1)
>                               "cert-exception-ui-ready", false);
>    });
>  
> +  // Sometimes clearing the cert override is not immediately picked up,
> +  // so we reload until we are on an actual cert error page.
> +  yield BrowserTestUtils.waitForCondition(function*() {

When you say "not immediately picked up", do you mean that certOverrideService.getValidityOverride returns false? Or that it returns true, but somehow the cert error page still takes time to "pick up" the change?

If it's the former, I think this code would look nicer if you just waited for getValidityOverride to return true; that would avoid the ContentTask and also make this test immune to CSS changes :P

Also btw, if this is really a problem, maybe we want to add a helper function for clearing overrides, which calls clearValidityOverride and returns a Promise which resolves when getValidityOverride is false?

In any case, I'd do a try push with plenty of retriggers before landing this. :)
What. The patch fixed it for the only failing platform (osx opt) but introduced the same error for ALL OTHER platforms. How on earth.
Nihanth, this should be ready for review now. Seems like someone didn't document that the second argument to waitForCondition is indeed an error message, not the interval. I'll make a separate bug to fix that.
Assignee: nobody → jhofmann
Comment on attachment 8815331 [details]
Bug 1303298 - Use BrowserTestUtils.waitForErrorPage to wait for certerror page load.

https://reviewboard.mozilla.org/r/96302/#review97318

Thanks!

::: browser/base/content/test/general/head.js:1034
(Diff revision 3)
> -  yield BrowserTestUtils.loadURI(gBrowser.selectedBrowser, url);
> +    yield BrowserTestUtils.loadURI(gBrowser.selectedBrowser, url);
> -  yield promiseErrorPageLoaded(gBrowser.selectedBrowser);
> +    yield promiseErrorPageLoaded(gBrowser.selectedBrowser);
> +    let isErrorPage = yield ContentTask.spawn(gBrowser.selectedBrowser, null, function*() {
> +      return content.document.documentURI.startsWith("about:certerror");
> +    });
> +    return isErrorPage;

waitForCondition yields on the condition function, so this can be simplified to just |return ContentTask.spawn...|

Although, setting a variable is more readable so I don't have a preference. :)
Attachment #8815331 - Flags: review?(nhnt11) → review+
Comment on attachment 8815331 [details]
Bug 1303298 - Use BrowserTestUtils.waitForErrorPage to wait for certerror page load.

https://reviewboard.mozilla.org/r/96302/#review97474
Pushed by jhofmann@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0eef7033686f
Work around intermittent browser_mixed_content_cert_override.js. r=nhnt11
https://hg.mozilla.org/mozilla-central/rev/0eef7033686f
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 53
Sorry, but failures continue, as in:

https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=40352660#L10058
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
possibly related to bug 1292960
See Also: → 1292960
We've built 51 RC. Mark 51 won't fix.
Since I'm not much closer to finding the origin of the slight intermittent delay in cert overrides, this patch should hopefully make sure that builds either timeout if the button never appears or that everything works correctly. I hope that this either solves this intermittent or at least significantly reduces it.
Retriggering try a zillion times on try didn't lead to an error but we already had that before and the situation actually got worse, so I'm not very optimistic.
Comment on attachment 8815331 [details]
Bug 1303298 - Use BrowserTestUtils.waitForErrorPage to wait for certerror page load.

Nihanth, thanks for helping me out via IRC. This could be related to DOMContentLoaded not being the right event to listen to. I replaced it with waitForErrorPage as per your suggestion.
Attachment #8815331 - Flags: review+ → review?(nhnt11)
We should replace all occurrences of promiseErrorPageLoaded with waitForErrorPage in any case. I'll make a new bug once this one lands. Didn't want to mix the two issues.
Comment on attachment 8815331 [details]
Bug 1303298 - Use BrowserTestUtils.waitForErrorPage to wait for certerror page load.

https://reviewboard.mozilla.org/r/96302/#review108522

::: browser/base/content/test/general/head.js:1019
(Diff revision 5)
>  
> -  // Sometimes clearing the cert override is not immediately picked up,
> -  // so we reload until we are on an actual cert error page.
> -  yield BrowserTestUtils.waitForCondition(function*() {
> -    yield BrowserTestUtils.loadURI(gBrowser.selectedBrowser, url);
> +  yield BrowserTestUtils.loadURI(gBrowser.selectedBrowser, url);
> -    yield promiseErrorPageLoaded(gBrowser.selectedBrowser);
> +  yield BrowserTestUtils.waitForErrorPage(gBrowser.selectedBrowser);

Hmm, usually this would be fine but considering the history with this test, let's be super thorough and start waiting for the error page before we load the URI, i.e.:

let p = BTU.waitForErrorPage(...);
yield BTU.loadURI(...);
yield p;
Attachment #8815331 - Flags: review?(nhnt11) → review+
Good call, thanks!
Pushed by jhofmann@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e60298cd3d42
Use BrowserTestUtils.waitForErrorPage to wait for certerror page load. r=nhnt11
https://hg.mozilla.org/mozilla-central/rev/e60298cd3d42
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
Duplicate of this bug: 1292960
Summary: Intermittent browser/base/content/test/general/browser_mixed_content_cert_override.js | Uncaught exception - TypeError: content.document.getElementById(...) is null → Intermittent browser/base/content/test/general/browser_(mixed_content_cert_override|addCertException).js | Uncaught exception - TypeError: content.document.getElementById(...) is null
Comment on attachment 8815331 [details]
Bug 1303298 - Use BrowserTestUtils.waitForErrorPage to wait for certerror page load.

https://reviewboard.mozilla.org/r/96302/#review108872

::: browser/base/content/test/general/head.js:1019
(Diff revision 6)
>    });
>  
> -  // Sometimes clearing the cert override is not immediately picked up,
> +  let loaded = BrowserTestUtils.waitForErrorPage(gBrowser.selectedBrowser);
> -  // so we reload until we are on an actual cert error page.
> -  yield BrowserTestUtils.waitForCondition(function*() {
> -    yield BrowserTestUtils.loadURI(gBrowser.selectedBrowser, url);
> +  yield BrowserTestUtils.loadURI(gBrowser.selectedBrowser, url);

Driveby nit: what's the point of the extra yield here? Doesn't seem like it matters given we're yielding for the actual load.
(In reply to :Gijs from comment #58)
> Comment on attachment 8815331 [details]
> Bug 1303298 - Use BrowserTestUtils.waitForErrorPage to wait for certerror
> page load.
> 
> https://reviewboard.mozilla.org/r/96302/#review108872
> 
> ::: browser/base/content/test/general/head.js:1019
> (Diff revision 6)
> >    });
> >  
> > -  // Sometimes clearing the cert override is not immediately picked up,
> > +  let loaded = BrowserTestUtils.waitForErrorPage(gBrowser.selectedBrowser);
> > -  // so we reload until we are on an actual cert error page.
> > -  yield BrowserTestUtils.waitForCondition(function*() {
> > -    yield BrowserTestUtils.loadURI(gBrowser.selectedBrowser, url);
> > +  yield BrowserTestUtils.loadURI(gBrowser.selectedBrowser, url);
> 
> Driveby nit: what's the point of the extra yield here? Doesn't seem like it
> matters given we're yielding for the actual load.

Good point, but I don't think it's worth a followup patch :)
Duplicate of this bug: 1334708
Target Milestone: Firefox 53 → Firefox 54
Whiteboard: [fxprivacy] → [fxprivacy][stockwell fixed]
You need to log in before you can comment on or make changes to this bug.