Mixed content favicon is shown when loading a SSL site right after opening a new tab

VERIFIED FIXED in Firefox 32

Status

()

defect
VERIFIED FIXED
5 years ago
5 years ago

People

(Reporter: AndreeaMatei, Assigned: Mardak)

Tracking

({reproducible})

unspecified
Firefox 35
Points:
---
Dependency tree / graph
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox31 wontfix, firefox32+ verified, firefox33+ verified, firefox34+ verified, firefox35 verified, firefox-esr31 wontfix)

Details

(Whiteboard: [mozmill])

Attachments

(5 attachments, 8 obsolete attachments)

This fails on most locales, while testing all of them. I believe it's intermittent as we've ran some several times and it hasn't always failed.

Cosmin is investigating this.

http://mozmill-sandbox.blargon7.com/#/remote/report/533154d2e18fa05bbdf2eaada88b2e3c
http://mozmill-sandbox.blargon7.com/#/remote/report/533154d2e18fa05bbdf2eaada89b3689
http://mozmill-sandbox.blargon7.com/#/remote/report/533154d2e18fa05bbdf2eaada89b0838
Blocks: 1045009
Posted file testCase.js (obsolete) —
This is the testcase for the failure, with the th locale it fails once in 4 runs.

So the issue is that if we navigate to ftp page prior of opening "https://ssl-dv.mozqa.com" sometimes istead of getting a verified domain identity "verifiedDomain", we get "unknownIdentity mixedContent mixedDisplayContent".
Andreea, can we get this bug reassigned? Cosmin is out this week, and it seems to be important that we make some progress here while he is away.
Assignee: cosmin.malutan → nobody
Couldn't reproduce this locally on my ubuntu 14.04 x64 machine with the th locale.
No failure in 200 runs.
I'll investigate this further.
Assignee: nobody → daniel.gherasim
Finally able to reproduce it. Will investigate this, maybe I find the reason.
I have run the testcase on all branches with the 'th' locale.

* Removed from setupModule this line:
> aModule.controller.mainMenu.click("#menu_close");
as, at least on aurora, caused the firefox to close before starting to run the test.
(In reply to daniel.gherasim from comment #5)
> * Removed from setupModule this line:
> > aModule.controller.mainMenu.click("#menu_close");

I don't see that line in setupModule at all for the DV certificate test. So what are you referring to?
(In reply to Henrik Skupin (:whimboo) from comment #6)
> (In reply to daniel.gherasim from comment #5)
> > * Removed from setupModule this line:
> > > aModule.controller.mainMenu.click("#menu_close");
> 
> I don't see that line in setupModule at all for the DV certificate test. So
> what are you referring to?

I was referring to the testcase cosmin provided. Not sure why that line was present at all.
Trying to reduce the testcase now.
Posted file test-case.js (obsolete) —
We actually don't have to restart firefox to reproduce this, we actually do this:

1. Open firefox
2. Access a page (like https://www.google.com)
3. Access 'about:newtab'
4. Access 'https://ssl-dv.mozqa.com'

Instead of the grey lock icon, we get a grey triangle with the exclamation sign inside.

I've reduced the testcase to this. Manually I can't reproduce this so I'll continue to investigate what we do different in our testcase.
Attachment #8465408 - Attachment is obsolete: true
Posted image example.png (obsolete) —
Well, which certificate problem is that? You should give those details.
Posted file test-case.js (obsolete) —
We get mixedContent (SSL with unauthenticated display content) only for the *DV* certificates & only if we access "about:newtab" before.

I also tried https://www.google.com instead of https://ssl-dv.mozqa.com and fails too.

Minimized the testcase again.
Attachment #8469969 - Attachment is obsolete: true
Posted file test-case.js (obsolete) —
It's not reproducing on CI machines. Bot locally I can reproduce it on 2 different machines.
Attachment #8469986 - Attachment is obsolete: true
So for which locales does that fail? Can you provide a list? It's interesting that this passes on en-US. There should be no difference to l10n builds.
(In reply to Henrik Skupin (:whimboo) from comment #13)
> So for which locales does that fail? Can you provide a list? It's
> interesting that this passes on en-US. There should be no difference to l10n
> builds.

It's failing with all locales except the en-US one, but can't reproduce it manually :(
So I was using your testcase as attached to this bug with the latest de locale of Aurora. All is working fine for me on Ubuntu 64.
(In reply to Henrik Skupin (:whimboo) from comment #15)
> So I was using your testcase as attached to this bug with the latest de
> locale of Aurora. All is working fine for me on Ubuntu 64.

Well, that's odd. I also couldn't reproduce with Aurora de. But it reproduced with latest Beta de & latest Nightly de. 

(Forgot to specify that all my tests were with Beta locales).
Sorry but both Nightly and Beta are working fine. The testcase doesn't fail for me.
(In reply to Henrik Skupin (:whimboo) from comment #17)
> Sorry but both Nightly and Beta are working fine. The testcase doesn't fail
> for me.

Wondering if the network could be at play here...
After installing 'Charles' for checking the proxy issues and after building FF to be able to log dumps I found-out that in fact this doesn't reproduce with nightly but only with beta, so I'll try to make the regresion range here. I checked this whit my testcase https://bugzilla.mozilla.org/attachment.cgi?id=8465408
This has been introduced between beta 32.0b7 and 32.0b8 and is gone in aurora and nightly.
So it's 100% reproducible for mi with beta 32.0b8.
I couldn't reproduce it on staging, so might indeed be something locally, maybe we catch something, I remember I saw something similar a year ago.
I builded Firefox from source to make the regression range between Beta 32.0b7 and 32.0b8:
http://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=9ebeb7669cda&tochange=10f7c4eae50c
So the changeset that causes this failure is:
http://hg.mozilla.org/releases/mozilla-beta/rev/6790f9333fec Bug 1039881
Assignee: daniel.gherasim → cosmin.malutan
Posted image lock-icon.png (obsolete) —
The change in preference browser.newtabpage.directory.source caused this:
http://hg.mozilla.org/releases/mozilla-beta/rev/6790f9333fec#l1.12

Attached is the difference in the identity-icon for the same page.
Thanks Cosmin for finding the regression range here. In general please immediately add the regressor to the dependencies, so the assignee is aware of. Even better is to CC him to the bug.

(In reply to Cosmin Malutan from comment #23)
> Created attachment 8477279 [details]
> lock-icon.png
> 
> The change in preference browser.newtabpage.directory.source caused this:
> http://hg.mozilla.org/releases/mozilla-beta/rev/6790f9333fec#l1.12
> 
> Attached is the difference in the identity-icon for the same page.

What is pre and what post the change? How is this related to tiles and the about:newtab page? It's not that clear to me.
Blocks: 1039881
Cosmin, can you test by removing this line that was added in the change:

http://hg.mozilla.org/releases/mozilla-beta/rev/6790f9333fec#l4.12
     4.2 +++ b/dom/base/nsGlobalWindow.cpp
    4.12 +  NS_ENSURE_TRUE(GetContextInternal(), NS_ERROR_NOT_INITIALIZED);
bobowen/bz, bholley seems to be on PTO for this week, so asking either of you for guidance.

bholley suggested adding the NS_ENSURE_TRUE(GetContextInternal) to SetNewDocument to fix a TART crash when landing bug 1039881 directly to beta.

Would it be possible that added check would cause some logic of detecting whether or not to show a lock vs security warn icon? I've asked Cosmin to see if removing that line fixes beta.
Also from bug 1039881, I confirmed that this "NS_ENSURE_TRUE(GetContextInternal) to SetNewDocument" wasn't needed on mozilla-central as smaug already fixed it with bug 1031303.

smaug, do you think uplifting all of bug 1031303 would fix this mozilla-beta bug (because I only landed that one line not knowing you had a complete fix).
I wouldn't dare to uplift bug 1031303 to beta this late.
(and I don't really know whether it would help here)
(In reply to Ed Lee :Mardak from comment #25)
> Cosmin, can you test by removing this line that was added in the change:
> 
> http://hg.mozilla.org/releases/mozilla-beta/rev/6790f9333fec#l4.12
>      4.2 +++ b/dom/base/nsGlobalWindow.cpp
>     4.12 +  NS_ENSURE_TRUE(GetContextInternal(), NS_ERROR_NOT_INITIALIZED);

Ed, would you mind to create a try server build with this line removed? Not sure if cosmin has a dev build ready on this box. But he could certainly test the try build tomorrow morning when he comes online.
(In reply to Henrik Skupin (:whimboo) from comment #30)
> Ed, would you mind to create a try server build with this line removed?
https://tbpl.mozilla.org/?tree=Try&rev=ae262863a691

I believe builds should appear http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/edilee@gmail.com-ae262863a691
Can someone describe what user functionality this is currently breaking on beta? Is this incorrectly showing mixed security warning for live pages that users would be visiting?

Assuming the added NS_ENSURE_TRUE(GetContextInternal) is causing this bug, not having that line causes TART crashes. Backing out that whole patch (i.e., not having bug 1039881) on beta will cause directory tiles to be shipped to release users.

Given that, we have some immediate actions that involve backing out code and no new code:

- ship 32 with this bug
- ship 32 crashing tart
- ship 32 with directory tiles

A simple 1 line pref talos change that might fix tart is browser.pagethumbnails.capturing_disabled = true
(In reply to Ed Lee :Mardak from comment #32)
> A simple 1 line pref talos change that might fix tart is
> browser.pagethumbnails.capturing_disabled = true
(Where this implies backing out that NS_ENSURE_TRUE(GetContextInternal) change but setting a pref that hopefully avoids the tart crash.)
(In reply to Ed Lee :Mardak from comment #32)
> ...
> Given that, we have some immediate actions that involve backing out code and
> no new code:
> 
> - ship 32 with this bug
> - ship 32 crashing tart
> - ship 32 with directory tiles

Given these options (which I agree with), I'd say TART is the obvious one to break since broken TART in talos-only context doesn't affect users.

However, Given that it may crash at TART when running at talos (specifically, it apparently crashes because about:newtab is displayed with empty tiles, which triggers background thumbnails capture which somehow trigger the crash - unless that 1 line patch is applied), the big question is: can it also crash for users regardless of talos/TART?

I didn't get a reply to this question, and I re-asked it at bug 1039881 comment 40 just now, though bholley is apparently on PTO till September 1st.


> A simple 1 line pref talos change that might fix tart is
> browser.pagethumbnails.capturing_disabled = true

That's fine for TART, considering the circumstances, but since this won't be modified for users as far as I understand, the question still remains: can beta 32 crash for users without that 1 line patch while browser.pagethumbnails.capturing_disabled = true ?
(In reply to Avi Halachmi (:avih) from comment #34)
> ...
> That's fine for TART, considering the circumstances, but since this won't be
> modified for users as far as I understand, the question still remains: can
> beta 32 crash for users without that 1 line patch while
> browser.pagethumbnails.capturing_disabled = true ?

Typo. Should have been:

... while browser.pagethumbnails.capturing_disabled = false ?
32 release candidate 1 is already spinning. Unless this is deemed a stop ship issue, it's going out in 32 as is. Based on the screenshot in comment 24, I think we want someone like dougt, mmc, or geekboy to comment on whether we can ship with this bug.

Marking tracking for 32 and setting 33 and 34 to unaffected based on comment 20.
Flags: needinfo?(sstamm)
Flags: needinfo?(mmc)
Flags: needinfo?(dougt)
I don't have enough context here. What is TART? Is it https://wiki.mozilla.org/Buildbot/Talos/Tests#TART.2FCART?

Mixed content bugs should cc tanvi, and bugs involving lock icons and cert display need :keeler. David, Tanvi: it looks like the bug in 32 is that the grey triangle warning (state normally reached after clicking “Disable Protection" for mixed content doorhanger) shows instead of the DV lock icon if the user visits about:newtab before visiting an SSL page. 

https://bugzilla.mozilla.org/attachment.cgi?id=8477279

IMO, this is a pretty bad bug unless people completely ignore the lock indicator (which is not outside the realm of possibility). We have shipped with EV indicator bugs in the past but displaying a warning instead of a lock icon is a different level of breakage. Do we have numbers on how many people reach about:newtab, or click the triangle warning, or the lock icon?
Flags: needinfo?(tanvi)
Flags: needinfo?(mmc)
Flags: needinfo?(dkeeler)
Cosmin, Daniel: does this bug persist for the rest of the session after visiting about:newtab? I.e., is it only the first SSL page visited that shows the grey triangle, or all of them afterwards?
From reading this bug (not able to reproduce at the moment locally) this is pretty bad. My understanding is that while people don't notice the lack of a security indicator, they do notice negative indicators (which is why we downgraded the yellow mixed content warning to a gray one). I don't know off the top of my head, but I imagine the SECURITY_UI telemetry histogram would have some relevant data.
Flags: needinfo?(dkeeler)
(In reply to [:mmc] Monica Chew (please use needinfo) from comment #37)
> I don't have enough context here. What is TART? Is it
> https://wiki.mozilla.org/Buildbot/Talos/Tests#TART.2FCART?

Yes. TART is a talos test which, among other things, displays the about:newtab page. It started crashing when a patch to disable Directory tiles was uplifted to beta 32 not long ago (bug 1039881). Without this patch, beta 32 will show Directory Tiles.

The crashes are apparently related to grabbing page thumbnail images in the background for the about:newtab page when Directory tiles are empty (disabled).

A 1 line patch to fix the crash followed. This 1 line patch is apparently what's causing this mixed-content warning icon instead of the lock (on some i10n builds?).

We can backout the 1 line patch and we can live with TART crashing while testing in talos, but it's currently unclear (to me) if there's also a crash scenario outside of talos as well (i.e. for users). This question awaits answer at bug 1039881 comment 40.

If there isn't a crash scenario for users from backing out that 1 line patch, then ultimately this is what we should do for 32 IMHO.
(In reply to Cosmin Malutan from comment #22)
> ...
> So the changeset that causes this failure is:
> http://hg.mozilla.org/releases/mozilla-beta/rev/6790f9333fec Bug 1039881

(In reply to Avi Halachmi (:avih) from comment #40)
> ...
> A 1 line patch to fix the crash followed. This 1 line patch is apparently
> what's causing this mixed-content warning icon instead of the lock (on some
> i10n builds?).

Actually, I'm not 100% sure the bug is due to the 1 line (4.12 at the quoted changeset). The patch includes both disabling the tiles and preventing the crash, so theoretically either of those could cause this bug.

(In reply to Ed Lee :Mardak from comment #31)
> (In reply to Henrik Skupin (:whimboo) from comment #30)
> > Ed, would you mind to create a try server build with this line removed?
> https://tbpl.mozilla.org/?tree=Try&rev=ae262863a691
> 
> I believe builds should appear
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/edilee@gmail.com-
> ae262863a691

^ testing with this build should answer if it's caused by disabling DT or by the 1 line to prevent the crash.
(In reply to daniel.gherasim from comment #16)
> Well, that's odd. I also couldn't reproduce with Aurora de. But it
> reproduced with latest Beta de & latest Nightly de. 
Can someone explain how to reproduce this? I have 4 builds:

- 32b5de 08-Aug-2014 http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/32.0b5/mac/de/Firefox%2032.0b5.dmg
- 32b6de 12-Aug-2014 http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/32.0b6/mac/de/Firefox%2032.0b6.dmg
- 32b7de 15-Aug-2014 http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/32.0b7/mac/de/Firefox%2032.0b7.dmg
- 32b8de 19-Aug-2014 http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/32.0b8/mac/de/Firefox%2032.0b8.dmg

Comment 16 was made 2014-08-11, so "latest beta" should have been 32b5. I open up a new tab (about:newtab) then after it loads, I open https://ssl-dv.mozqa.com/ and see the lock.

(In reply to Cosmin Malutan from comment #20)
> This has been introduced between beta 32.0b7 and 32.0b8 and is gone in
> aurora and nightly.
> So it's 100% reproducible for mi with beta 32.0b8.
Comment 20 was made 2014-08-21 and says the bug was just introduced yet this bug was originally filed 2014-07-31.

Bug 1039881 listed as the regressing bug landed on mozilla-beta 2014-08-18, and that falls into the "between b7 and b8" time range.

I also tried testing with 32b8de opening about:newtab then https://ssl-dv.mozqa.com/ or https://www.google.com/ and still see the lock icon. Same thing for 32b6de and 32b7de.
(In reply to Ed Lee :Mardak from comment #42)
> ...
> Comment 20 was made 2014-08-21 and says the bug was just introduced yet this
> bug was originally filed 2014-07-31.

It first landed on aurora on 2014-07-18 without the crash-fix (bug 1039881 comment 4) and backed out quickly. A related change also landed on m-c on the same day (bug 1039881 comment 6).

But I didn't try to follow all the patches and uplifts of this patch (some of it also landed on fx-team branch AFAIK and tbpl was reset since...).
(In reply to daniel.gherasim from comment #12)
> Created attachment 8470001 [details]
> test-case.js
I tried running this in mozmill and the test passed:

$ ./mozmill -b Firefox32b8de.app -t test-case.js 
RESULTS | Passed: 1
$ ./mozmill -b Firefox32b7de.app -t test-case.js 
RESULTS | Passed: 1
$ ./mozmill -b Firefox32b6de.app -t test-case.js 
RESULTS | Passed: 1
$ ./mozmill -b Firefox32b5de.app -t test-case.js 
RESULTS | Passed: 1

And in french too:

$ ./mozmill -b Firefox32b8fr.app -t test-case.js 
RESULTS | Passed: 1
$ ./mozmill -b Firefox32b7fr.app -t test-case.js 
RESULTS | Passed: 1
Some additional info here:
we've only seen this on from the local network, we weren't able to reproduce the issue from inside scl3, so there might be a network or proxy component at play.
Ed Lee, I removed the line and rebuild-ed the firefox but thi issue still reproduces, as I said before the preferance is the one introducing this.
http://hg.mozilla.org/releases/mozilla-beta/rev/6790f9333fec#l4.12

Monica, this issue reproduces only on the first SSL visited page.

> (In reply to Cosmin Malutan from comment #20)
> > This has been introduced between beta 32.0b7 and 32.0b8 and is gone in
> > aurora and nightly.
> > So it's 100% reproducible for mi with beta 32.0b8.
> Comment 20 was made 2014-08-21 and says the bug was just introduced yet this
> bug was originally filed 2014-07-31.
>
> Bug 1039881 listed as the regressing bug landed on mozilla-beta 2014-08-18,
> and that falls into the "between b7 and b8" time range.
This bug reproduces when we have the tiles disabled, this could have been the case for older builds, before the bug marked as a regressors for different locales, where as locale specific we might or we might not have tiles set as preferences. This confirms the concern from comment 41 this can't be causes by changing the preferences but by the absences of the tiles in the about:newtab page.
Considering that setting the preference couldn't add this regresion, I added the preference "browser.newtabpage.directory.source" in the testcase and it reproduces with the latest nightly, so we will have to make a new regresion range here. 
>Cu.import("resource://gre/modules/Services.jsm");
>Services.prefs.QueryInterface(Ci.nsIPrefBranch).setCharPref("browser.newtabpage.directory.source", 
>                                                            "data:application/json,{}");
(In reply to Ed Lee :Mardak from comment #26)
> bobowen/bz, bholley seems to be on PTO for this week, so asking either of
> you for guidance.
> 
> bholley suggested adding the NS_ENSURE_TRUE(GetContextInternal) to
> SetNewDocument to fix a TART crash when landing bug 1039881 directly to beta.
> 
> Would it be possible that added check would cause some logic of detecting
> whether or not to show a lock vs security warn icon? I've asked Cosmin to
> see if removing that line fixes beta.

Given comments 46 and 47, I assume that this question is no longer relevant?

I'm not sure it's an area I know much about, but if you need me to look at this further, let me know.
I made looked for the regression with the preference set, but I've got to the place where the pref was renamed, so it will take a litlle bit longer to find the culprit here.
http://hg.mozilla.org/mozilla-central/rev/9f810f13f251
(In reply to Bob Owen (:bobowen) from comment #48)
> Given comments 46 and 47, I assume that this question is no longer relevant?
That's correct. Thanks for following along (smaug/bz/bholley too). Sorry for bugspam!
(In reply to Cosmin Malutan from comment #49)
> I made looked for the regression with the preference set, but I've got to
> the place where the pref was renamed, so it will take a litlle bit longer to
> find the culprit here.
> http://hg.mozilla.org/mozilla-central/rev/9f810f13f251
Just set "browser.newtabpage.directorySource" to "data:application/json,{}" in addition to setting "browser.newtabpage.directory.source" to "data:application/json,{}"
(In reply to Andrei Eftimie from comment #45)
> Some additional info here:
> we've only seen this on from the local network, we weren't able to reproduce
> the issue from inside scl3, so there might be a network or proxy component
> at play.
What does this mean in terms of how users might hit this bug? E.g., in what situations would visiting https://www.google.com/ show the mixed content warning?
(In reply to Ed Lee :Mardak from comment #52)
> (In reply to Andrei Eftimie from comment #45)
> > Some additional info here:
> > we've only seen this on from the local network, we weren't able to reproduce
> > the issue from inside scl3, so there might be a network or proxy component
> > at play.
> What does this mean in terms of how users might hit this bug? E.g., in what
> situations would visiting https://www.google.com/ show the mixed content
> warning?

That is the big question. I haven't yet figured out _what_ the network conditions must be for this to occur (or if it really is network related).

We do use a proxy in scl3 and that has in the past affected certain parts of our tests.
Cosmin, could you please try this on one CI machine? Maybe take one from staging. Disable the proxy and see if you can reproduce the issue. (I'm not sure if this is feasible, but it might shine some light on how this issue is behaving)
Flags: needinfo?(cosmin.malutan)
Posted file testCase.js (obsolete) —
Thanks Ed Lee, that's what I was doing :) 
The culpruit here is the bug 973532.
http://hg.mozilla.org/mozilla-central/rev/c3b86a5d52b6

Andrei, I'll check that.

Tim do you have any ideea what could cause this behavior here?
Attachment #8470001 - Attachment is obsolete: true
Flags: needinfo?(cosmin.malutan) → needinfo?(ttaubert)
Not without taking a deeper look, no. If this is important you could request the firefox-backlog flag and we can prioritize it accordingly. I have too much on my plate right now, sorry.
Flags: needinfo?(ttaubert)
Thanks Cosmin for doing that search. Bug 973532 did land for Firefox 32 (so not part of current Release 31 which has pref set to {}), and it sounds like this bug is hit only if the directorySource pref is also set to {} -- only done so on beta to turn off directory tiles before release.
Blocks: 973532
(In reply to Andrei Eftimie from comment #53)
> Cosmin, could you please try this on one CI machine? Maybe take one from
> staging. Disable the proxy and see if you can reproduce the issue. (I'm not
> sure if this is feasible, but it might shine some light on how this issue is
> behaving)

There is no need to disable the proxy on such a machine. We never use it when connecting to mozqa.com.
(In reply to Cosmin Malutan from comment #46)
> Monica, this issue reproduces only on the first SSL visited page.
To summarize so far, we have a beta/32-specific bug triggered by a non-standard/nightly pref configuration to turn off a feature for Release that results in the incorrect site identity icon being displayed one time per Firefox session potentially only for certain network configurations. ?
(In reply to Henrik Skupin (:whimboo) from comment #57)
> There is no need to disable the proxy on such a machine. We never use it
> when connecting to mozqa.com.
Whimboo I had already checked that, and disabling the proxy doesn't help to reproduce it on CI nodes.
Ed Lee, that's correct, but it might reproduce again, I'm not sure, if you navigate to about:newtab again.
I honestly don't think this depends on the network configuration but more on the host location, as a guest I incline to say we hit in this issue when only if the host sends the certificate a little bit slower, so it might be a race condition.
Regarding the importance of this bug I think it won't look nice if an user navigates to a secured page to do a payment for example and the identity of the site won't be verified, even if in reality it is.
Yes, having the first HTTPS page be effectively broken (due to mixed content) is a stop ship bug.  Can we get a backout or a fix ASAP.

Also tests please.
Flags: needinfo?(dougt)
Flags: needinfo?(tanvi)
Flags: needinfo?(sstamm)
Cosmin currently owns this bug but I don't think will be providing the fix. We need an owner who can work on a fix today.

Ed - Are you going to take this one?
Flags: needinfo?(edilee)
whimboo says Cosmin and Andrei are out today, so I'm working with whimboo to get this reproducing on my machine to see what options we have.

Although whimboo initially says he cannot reproduce the issue.
Assignee: cosmin.malutan → edilee
Flags: needinfo?(edilee)
The problem here is that we really need someone from SoftVision to check that. It might really be related to their network. There is no way for me to get this reproduced either. And all reports so far are from testruns executed inside the SoftVision network.

Ed, would a HTTP log be useful here? We could grab one tomorrow for investigation. For today it is too late given all are gone.

I think we should move this bug into the appropriate component.
The grey triangle icon comes up when there is Mixed Display Content on a page (ex: https page with http images embedded).  Why would about:newtab cause this icon to show up?  The Mixed Content Blocker looks for "https" pages and then checks the schemes of their embedded resources.  Since about:newtab isn't https, it is exempt, except if it embeds an https iframe.  If about:newtab has an https iframe that embeds an http image, that would cause the grey triangle to show up on about:newtab.  That combined with bug https://bugzilla.mozilla.org/show_bug.cgi?id=947079 might be the reason the grey triangle sometimes sticks around on an https://google.com page.

I need to learn more about how about:newtab works and why turning it off would cause this bug.
(In reply to Tanvi Vyas [:tanvi] from comment #65)
> ...
> I need to learn more about how about:newtab works and why turning it off
> would cause this bug.

Disabling the Directory Tiles in about:newtab leaves empty thumbnail spots. This apparently causes the background thumbnails service to start getting screen dumps of.. pages from the frecency list maybe? On a clean profile - not sure what it would try to grab. Maybe some Mozilla pages? Maybe if something it grabs has mixed content, then it somehow keeps some "mixed-content status" and displays it at the next https page it "properly" loads (i.e. not by the thumbnails service)?
(In reply to Avi Halachmi (:avih) from comment #66)
> Disabling the Directory Tiles in about:newtab leaves empty thumbnail spots.
> This apparently causes the background thumbnails service to start getting
> screen dumps of.. pages from the frecency list maybe? On a clean profile -
> not sure what it would try to grab. Maybe some Mozilla pages? Maybe if
> something it grabs has mixed content, then it somehow keeps some
> "mixed-content status" and displays it at the next https page it "properly"
> loads (i.e. not by the thumbnails service)?

Are the thumbnails loaded via <img src>?  Or is there an <iframe> for each thumbnail with an <img src> in it?  Do you have a link to the code for loading the thumbnails?
(In reply to Tanvi Vyas [:tanvi] from comment #67)
> Or is there an <iframe> for each thumbnail with an <img src> in it?
From my quick searching (so not sure if this is actually the codepath):

http://mxr.mozilla.org/mozilla-central/source/toolkit/components/thumbnails/BackgroundPageThumbs.jsm#118

121     let hostWindow = Services.appShell.hiddenDOMWindow;
122     let iframe = hostWindow.document.createElementNS(HTML_NS, "iframe");
123     iframe.setAttribute("src", "chrome://global/content/mozilla.xhtml");

An iframe is created in the hidden window that gets reused to capture thumbnails in the background.
(In reply to Tanvi Vyas [:tanvi] from comment #67)
> Are the thumbnails loaded via <img src>?
Sorry, if you're referring to the thumbnails in the newtab page, they're loaded as the background of a span:

background-image: url("moz-page-thumb://thumbnail?url=http...

But if the thumbnail does not exist, it'll request the background thumbnailer to capture it:
http://mxr.mozilla.org/mozilla-central/source/browser/base/content/newtab/sites.js#149
149       BackgroundPageThumbs.captureIfMissing(this.url);
Given comment 8 ("can't reproduce manually") and the fact that this is only an issue when running mozmill on certain machines, I think it's much more likely to be a timing-related issue with the test script (or test harness helpers it relies on) than a real issue that affects users.

We should investigate further, but this is not chemspill-worthy yet, and I don't think we need to worry about it for 32.
Assuming this is a timing issue with the test, the background thumbnailing triggered by the new tab page is likely just throwing off the timing in certain cases.
(In reply to :Gavin Sharp [email: gavin@gavinsharp.com] from comment #70)
> We should investigate further, but this is not chemspill-worthy yet, and I
> don't think we need to worry about it for 32.
lmandel, given gavin's comments, should we status-firefox32=wontfix?
Flags: needinfo?(lmandel)
So whatever happens here, some questions to Cosmin we have to get answered because we are missing important information:

* On which platforms does this failure happen?
* Can this be reproduced manually or only with Mozmill?
* How often does it reproduce?
* What happens when you run the test with the same machine but in another network?
* Are all locales affected, or only some? What about en-US?

Maybe a HTTP log would be helpful here:
https://developer.mozilla.org/en-US/docs/HTTP_Logging
Flags: needinfo?(cosmin.malutan)
The reason I suspect this is a timing issue is that I don't have a lot of faith in the mozmill test helpers in general.

Understanding what testCase.js is doing exactly is tricky because of a bunch of confusing mozmill abstractions. I don't know where the canonical mozmill code lives, but let's use https://github.com/mozilla/mozmill/.
 
controller.open is:
https://github.com/mozilla/mozmill/blob/8cad8438b6f6237f449add334f97d731448baff7/mozmill/mozmill/extension/resource/driver/controller.js#L286

which just calls gBrowser.selectedBrowser.loadURI.

Then waitForPageLoad is called without arguments:
https://github.com/mozilla/mozmill/blob/8cad8438b6f6237f449add334f97d731448baff7/mozmill/mozmill/extension/resource/driver/controller.js#L956

which delegates to waitFor which IIRC does event loop spinning and polling (ugh).

It's polling for "hasPageLoaded" returning true for the current tab's content window:

https://github.com/mozilla/mozmill/blob/8cad8438b6f6237f449add334f97d731448baff7/mozmill/mozmill/extension/resource/modules/windows.js#L119

That method does some confusing tracking of global window "loaded" state, which seems to be set to true for a given window when any of "pageshow" or "DOMContentLoaded" fire:
https://github.com/mozilla/mozmill/blob/8cad8438b6f6237f449add334f97d731448baff7/mozmill/mozmill/extension/resource/modules/windows.js#L178
https://github.com/mozilla/mozmill/blob/8cad8438b6f6237f449add334f97d731448baff7/mozmill/mozmill/extension/resource/modules/windows.js#L200
or when a load event fires (or the readyState is "complete" when the listeners are added):
https://github.com/mozilla/mozmill/blob/8cad8438b6f6237f449add334f97d731448baff7/mozmill/mozmill/extension/resource/modules/windows.js#L238

Every time one of those events is fired, a new UUID is created and stored for that window's "id_load_in_transition".

Every time hasPageLoaded is called, id_load_handled is updated to match the last id_load_in_transition, and it returns true if it has changed since the last hasPageLoaded invocation.

Depending on how often waitFor polls hasPageLoaded, the answer could change depending on timing of those various events, and the expect.contain check in that testcase will fail if it is checked before the onSecurityChange for the mozqa.com page has fired.

The code from bug 973532 does potentially affect the "pageshow" event firing for about:newtab, which might be throwing off windows.js' "loaded" state accounting somehow. Someone needs to dig in further and debug that mozmill code.
No longer blocks: 1039881
Flags: needinfo?(lmandel)
(tracking32- because this can't be reproduced outside of mozmill, and not even consistently within mozmill across machines/environments - renominate if that understanding changes)
(In reply to Ed Lee :Mardak from comment #72)
> (In reply to :Gavin Sharp [email: gavin@gavinsharp.com] from comment #70)
> > We should investigate further, but this is not chemspill-worthy yet, and I
> > don't think we need to worry about it for 32.
> lmandel, given gavin's comments, should we status-firefox32=wontfix?

Gavin already set this to tracking- for 32. I had flagged for tracking and listed this as an RC2 driver after speaking with mmc. However, I don't think that either of us assumed that this would not reproduce manually. Given that we still can't reproduce manually, I agree that this doesn't look to be required for 32 at this point. As Gavin said, please renominate and ping me if we figure out how to reproduce this manually.
Posted video manual.ogv
(In reply to Henrik Skupin (:whimboo) from comment #73)
> So whatever happens here, some questions to Cosmin we have to get answered
> because we are missing important information:
> 
> * On which platforms does this failure happen?
So far on Linux and OSX
> * Can this be reproduced manually or only with Mozmill?
Yes it can be reproduced manually, attached is the screencast
> * How often does it reproduce?
On my station with mozmill almost every time, manually at least once in 5 tries. 
> * What happens when you run the test with the same machine but in another
> network?
Not sure how I'll do that.
> * Are all locales affected, or only some? What about en-US?
I mentioned this already, it was failing for a while with localized build which doesn't had tiles set, and now that with the preference that disables the tiles on beta (Bug 1039881) it reproduces on en-US too.

> Maybe a HTTP log would be helpful here:
> https://developer.mozilla.org/en-US/docs/HTTP_Logging
I will attach the log in a second.
Flags: needinfo?(cosmin.malutan)
Posted file http_log.txt.zip
Here is the HTTP logging file.
http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/32.0b8/

Steps to reproduce:
Navigate to: about:newtab
Navigate to a secured webpage (https://ssl-dv.mozqa.com/)

Check the page identity:
Expected: verified
Actual result: mixed-content
[Tracking Requested - why for this release]:

(In reply to :Gavin Sharp [email: gavin@gavinsharp.com] from comment #74)
> The reason I suspect this is a timing issue is that I don't have a lot of
> faith in the mozmill test helpers in general.

We found dozen of bugs lately with Mozmill, and so far any other suspicion regarding bad behavior in Mozmill has been turned out not being true. You should really give more faith into it. But I would appreciate your feedback regarding real issues, and things we have to improve.

> Understanding what testCase.js is doing exactly is tricky because of a bunch
> of confusing mozmill abstractions. I don't know where the canonical mozmill
> code lives, but let's use https://github.com/mozilla/mozmill/

That is the correct repository.

> Depending on how often waitFor polls hasPageLoaded, the answer could change
> depending on timing of those various events, and the expect.contain check in
> that testcase will fail if it is checked before the onSecurityChange for the
> mozqa.com page has fired.

Given that this problem can be reproduced manually as given by Cosmin above by a screencast, the mentioned situation should not apply. But in general I would be interested when onSecurityChange is getting fired for all the events coming in by loading a page. I'm happy to update our code to include that if necessary.

> > * What happens when you run the test with the same machine but in another
> > network?
> Not sure how I'll do that.

Unplug the machine and e.g. take it at home with you? Or if you have a mobile connection, so you don't have to use the SV network.

> > * Are all locales affected, or only some? What about en-US?
> I mentioned this already, it was failing for a while with localized build
> which doesn't had tiles set, and now that with the preference that disables
> the tiles on beta (Bug 1039881) it reproduces on en-US too.

Doing explanations somewhere inbetween it is hard to follow in such a long bug. That's why all steps etc should be as best contained within a single comment.

We cannot be sure so far how many people will be affected by this problem. So re-requesting tracking for Firefox 32 until we know what's going on here.
I've managed to reproduce this manually with the steps outline in comment 79, same network, different machine (OSX 10.9). With the current 32 release build, en-US.
I've got it to reproduce at the 6th attempt.

I'll check different networks now.
(In reply to Cosmin Malutan from comment #78)
> Created attachment 8479666 [details]
> http_log.txt.zip
> Here is the HTTP logging file.
Thanks. Doing a quick grep for uri=

2014-08-27 07:15:46.951830 uri=https://mozorg.cdn.mozilla.net/media/img/favicon.ico#-moz-resolution=16,16
2014-08-27 07:15:46.951948 uri=https://mozorg.cdn.mozilla.net/media/img/favicon.ico#-moz-resolution=16,16
2014-08-27 07:15:47.332473 uri=https://mozorg.cdn.mozilla.net/media/img/firefox/australis/logo.png?2014-03
2014-08-27 07:15:47.332713 uri=https://mozorg.cdn.mozilla.net/media/img/firefox/australis/logo.png?2014-03
2014-08-27 07:15:48.298971 uri=https://ssl-dv.mozqa.com/
2014-08-27 07:15:48.299868 uri=https://ssl-dv.mozqa.com/
2014-08-27 07:15:48.415744 uri=https://www.mozilla.org/en-US/firefox/32.0/firstrun/
2014-08-27 07:15:48.415858 uri=https://www.mozilla.org/en-US/firefox/32.0/firstrun/
2014-08-27 07:15:48.671653 uri=http://gtssldv-ocsp.geotrust.com/
2014-08-27 07:15:48.671809 uri=http://gtssldv-ocsp.geotrust.com/
2014-08-27 07:15:49.340285 uri=https://mozorg.cdn.mozilla.net/media/css/tabzilla-min.css?build=0ee4bf0

Anyone more familiar with the http logging output? The requests to http://geotrust happen within the same second as loading https://ssl-dv
(In reply to Cosmin Malutan from comment #78)
> Created attachment 8479666 [details]
> http_log.txt.zip

This log file is damn huge. I will work with Cosmin so we can get it reduced for analysis.
(In reply to Ed Lee :Mardak from comment #82)
> Anyone more familiar with the http logging output? The requests to
> http://geotrust happen within the same second as loading https://ssl-dv

It might be that this comes from the first run page. Please wait a bit until we have the reduced log file ready.
(In reply to Henrik Skupin (:whimboo) from comment #84)
> It might be that this comes from the first run page.
Cosmin, can you wait 30 seconds between each action while still reproducing this bug?

I.e., open firefox, wait 30 seconds for first run page to completely load, open about:newtab, wait 30 seconds, open ssl-dv?
With a delay between about:newtab and the secured page it will pass, with a delay only between first page and the about:newtab it will still pass.
What happens here is that the tile is still loading for me while the about:newtab has already loaded and when we navigate to a secure webpage we get mixed-content
Posted file testcase (obsolete) —
So the problem here is indeed the loading of a SSL page in a background tab. With a fresh profile this will always be the first run page. To simulate that I have updated the testcase with a slow loading page from mozqa.com. So when loading a SSL page after about:newtab, and the load finishes around the same time, you will see the broken SSL identifier.
Attachment #8469970 - Attachment is obsolete: true
Attachment #8477279 - Attachment is obsolete: true
Attachment #8478981 - Attachment is obsolete: true
To summarize:
With a fresh profile, when we open the firefox it will take us to the first-run start page. Afterwards opening the "about:newtab", with no tiles preset, will show the first-run page as a tile, being one from history. Now when navigating to a secured address instead of having a verified identity we will have mixed-content.
This behaviour was visible on localized build which had no tiles set (and the firstrun page will be given as the one and single tile), and on Beta en-US from build 32.0b8 when the preference to have no tiles by default was added (Bug 1039881). Doing a regression range with the preference set to nothing took me to bug 973532.

To reproduce this manually.
* Start firefox with a fresh profile
* Navigate to a slow-loading page: https://ssl-dv.mozqa.com/data/firefox/layout/delayed_load.php?seconds=3
* Open about:newtab
* Navigate to a secured webpage (https://ssl-dv.mozqa.com/)
Posted file testcase
One more testcase update with the following changes and observations:

* Loading the first-run page is necessary (here for the cy locale)
* Using about:newtab is mandatory. With about:blank I don't see the problem
* Loading the delayed SSL page is not necessary
* The test does not fail if about:newtab is unloaded before the thumbnail is shown
* The failure happens only for the first time. Opening another new tab and loading a different SSL page shows the correct favicon.
Attachment #8479705 - Attachment is obsolete: true
Here the manual steps:

1. Create a new profile
2. Wait until the first run page has been loaded
3. Open a new tab
4. Paste 'https://ssl-dv.mozqa.com' immediately into the locationbar
5. Press Return right before the thumbnail for the first-run page is shown

This problem here is really a race condition and very hard to hit even manually. I think that we get the security info from about:newtab applied to the afterward loaded SSL page.
Oh, and I think that given the current observation we should not block Firefox 32.0. I will leave it up for drivers or module owners to do the final decision.
What is the best component to move this bug to?
Summary: [l10n] Test failure "There is a lock icon - 'url("chrome://browser/skin/identity-icons-https-mixed-display.png")' should contain 'identity-icons-https.png'" in testDVCertificate.js → Mixed content favicon is shown when loading a SSL site right after opening a new tab
Btw I can reproduce this also with Firefox 31.0ESR. So each version which is using directory tiles is fine.
(In reply to Henrik Skupin (:whimboo) from comment #93)
> Btw I can reproduce this also with Firefox 31.0ESR.
So this probably isn't a regression of bug 973532.

(In reply to Henrik Skupin (:whimboo) from comment #90)
> This problem here is really a race condition and very hard to hit even
> manually. I think that we get the security info from about:newtab applied to
> the afterward loaded SSL page.
Getting a http log of a failing case and a not failing case should be useful in debugging what's going on.
Assignee: edilee → nobody
I was able to reproduce it with relatively high consistency with a "normal" network and Firefox 32 b8 en-US ( http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/32.0b8/win32/en-US/ ).

Here are my steps:

1. Delete the profile.
2. Open Firefox and dismiss any dialogs (e.g. default browser).
3. Wait till the first run page loads and settles (I wait 10s).
4. Open a new tab (CTRL-T) and paste an https url _without_ pressing enter (I use https://ssl-dv.mozqa.com/ ).

Status right after opening the new tab: 1 tile names "Welcome to Firefox" with a blank-white image.

5. This is the time-critical step. I press enter about 2.5-3s after I open the tab, such that it's before the first-run thumbnail/screenshot appears at the tile, but such that once it goes away from the newtab page to the https page (about 1-2s after pressing enter) - the first-run thumbnail thumb is already displayed.

This gives me very high rate of reproducing the mixed-content warning bug.

If I press enter right after opening the newtab page, or wait until the welcome thumbnail/screenshot appears at the tile - then it shows the normal lock icon.


And now the interesting bit: it can happen again at the same session, i.e., it's not limited to the first https visit.

The key to reproduce it again on the same session is to press enter for some https url _before_ the newtab shows the thumbnail/screenshot, but such that it leaves the newtab page after the thumbnail is displayed.

E.g., after completing step 5 above and reproducing it once, continue with the following steps:

6. prepare: copy to the clipboard: https://yahoo.com
7. type cnn.com and press enter on one of the tabs.
8. Once cnn.com completes loading (i.e. the throbber stops spinning), open a new tab, make sure it still doesn't have the cnn thumbnail/screenshot, paste the https url (https://yahoo.com) and press enter before the cnn screenshot appears at the tiles, but such that it leaves the newtab page to yahoo after the newtab page shows the cnn thumbnail.

9. Mixed content warning for https://yahoo.com as well.

Reloading the page makes the warning go away and it shows the lock icon.
Tim, re comment 95, what can you make of it?
Flags: needinfo?(ttaubert)
Given that:
1. this is difficult to reproduce
2. this shipped in 31 (impacts 31 ESR - comment 93) and I'm not aware of any user reports of this issue
3. this seems like it will impact a new profile more than an existing profile that has history
4. the workaround is to refresh the page
5. we don't yet have a fix

I think we should ship 32 as is and work on a fix for 33 (assuming 33 is affected, see below). Depending on the complexity of the fix, I think this is a candidate for a ride-along if we have to chemspill.

I'm leaving status as 32 affected so that we can track for any chemspill. I have also set ESR31 as affected based on comment 93. Given this comment, I have set status for 33 and 34 to ? and would like confirmation whether this issue impacts these releases as well.
This affects nightly 34 as well.

Same STR as on comment 95:

- Disable Directory tiles:
browser.newtabpage.directory.source = data:application/json,{}

- Visit a page X.
- Once it stops loading, immediately open a newtab with CTRL-T.
- Paste any https url and press enter before X shows as a thumbnail/screenshot at its tile, but such that it leaves the newtab page after X thumbnail is shown.

--> Mixed content warning.
The onSecurityChange() notification that causes the mixed content warning has the following backtrace, not sure if that helps:

* thread #1: tid = 0x58b7f3, 0x0000000102c17591 XUL`nsSecureBrowserUIImpl::TellTheWorld(this=<unavailable>, warnSecurityState=1606405927, aRequest=0x0000000125ad1058) + 129 at nsSecureBrowserUIImpl.cpp:1407, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x0000000102c17591 XUL`nsSecureBrowserUIImpl::TellTheWorld(this=<unavailable>, warnSecurityState=1606405927, aRequest=0x0000000125ad1058) + 129 at nsSecureBrowserUIImpl.cpp:1407
    frame #1: 0x0000000102c16416 XUL`nsSecureBrowserUIImpl::EvaluateAndUpdateSecurityState(nsIRequest*, nsISupports*, bool, bool) [inlined] nsSecureBrowserUIImpl::UpdateSecurityState(this=<unavailable>, aRequest=<unavailable>) + 422 at nsSecureBrowserUIImpl.cpp:1305
    frame #2: 0x0000000102c1640b XUL`nsSecureBrowserUIImpl::EvaluateAndUpdateSecurityState(this=<unavailable>, aRequest=<unavailable>, info=<unavailable>, withNewLocation=<unavailable>, withNewSink=<unavailable>) + 411 at nsSecureBrowserUIImpl.cpp:550
    frame #3: 0x0000000102c177da XUL`nsSecureBrowserUIImpl::OnLocationChange(this=0x000000011baad280, aWebProgress=<unavailable>, aRequest=0x0000000125ad1058, aLocation=<unavailable>, aFlags=<unavailable>) + 474 at nsSecureBrowserUIImpl.cpp:1487
    frame #4: 0x00000001016eec4a XUL`nsDocLoader::FireOnLocationChange(this=0x0000000126021000, aWebProgress=0x0000000126021028, aRequest=0x0000000125ad1058, aUri=0x000000011f240120, aFlags=0) + 282 at nsDocLoader.cpp:1282
    frame #5: 0x0000000102b12a03 XUL`nsDocShell::CreateContentViewer(this=<unavailable>, aContentType=<unavailable>, request=<unavailable>, aContentHandler=<unavailable>) + 1971 at nsDocShell.cpp:8590
    frame #6: 0x0000000102b1e58f XUL`nsDSURIContentListener::DoContent(this=0x000000010f6fe9e0, aContentType=0x000000012395ec88, aIsContentPreferred=<unavailable>, request=0x0000000125ad1058, aContentHandler=0x000000011bb1eba8, aAbortProcess=0x00007fff5fbfd287) + 335 at nsDSURIContentListener.cpp:141
    frame #7: 0x00000001016f0636 XUL`nsDocumentOpenInfo::TryContentListener(this=<unavailable>, aListener=0x000000010f6fe9e0, aChannel=0x0000000125ad1058) + 406 at nsURILoader.cpp:726
    frame #8: 0x00000001016efb8b XUL`nsDocumentOpenInfo::DispatchContent(this=0x000000011bb1eb80, request=0x0000000125ad1058, aCtxt=<unavailable>) + 443 at nsURILoader.cpp:401
    frame #9: 0x00000001016ef980 XUL`nsDocumentOpenInfo::OnStartRequest(this=0x000000011bb1eb80, request=0x0000000125ad1058, aCtxt=0x0000000000000000) + 192 at nsURILoader.cpp:262
    frame #10: 0x00000001011e4adb XUL`mozilla::net::nsHttpChannel::CallOnStartRequest(this=0x0000000125ad1000) + 635 at nsHttpChannel.cpp:913
    frame #11: 0x00000001011e7371 XUL`mozilla::net::nsHttpChannel::ContinueProcessNormal(this=0x0000000125ad1000, rv=<unavailable>) + 417 at nsHttpChannel.cpp:1516
    frame #12: 0x00000001011e64ad XUL`mozilla::net::nsHttpChannel::ProcessNormal(this=<unavailable>) + 285 at nsHttpChannel.cpp:1451
    frame #13: 0x00000001011e6122 XUL`mozilla::net::nsHttpChannel::ProcessResponse(this=0x0000000125ad1000) + 1298 at nsHttpChannel.cpp:1361
    frame #14: 0x00000001011ee505 XUL`mozilla::net::nsHttpChannel::OnStartRequest(this=0x0000000125ad1000, request=<unavailable>, ctxt=<unavailable>) + 405 at nsHttpChannel.cpp:4948
    frame #15: 0x00000001010fcf89 XUL`nsInputStreamPump::OnStateStart(this=0x0000000122cb19c0) + 201 at nsInputStreamPump.cpp:531
    frame #16: 0x00000001010fcd19 XUL`nsInputStreamPump::OnInputStreamReady(this=0x0000000122cb19c0, stream=<unavailable>) + 233 at nsInputStreamPump.cpp:433
    frame #17: 0x000000010106ea60 XUL`nsInputStreamReadyEvent::Run(this=<unavailable>) + 32 at nsStreamUtils.cpp:88
    frame #18: 0x000000010107f984 XUL`nsThread::ProcessNextEvent(this=0x000000010030ed00, aMayWait=<unavailable>, aResult=0x00007fff5fbfd86f) + 740 at nsThread.cpp:769
    frame #19: 0x000000010109d000 XUL`NS_ProcessPendingEvents(aThread=<unavailable>, aTimeout=20) + 80 at nsThreadUtils.cpp:207
    frame #20: 0x000000010226eed3 XUL`nsBaseAppShell::NativeEventCallback(this=0x000000010a6b2100) + 99 at nsBaseAppShell.cpp:98
    frame #21: 0x00000001022b2eda XUL`nsAppShell::ProcessGeckoEvents(aInfo=0x000000010a6b2100) + 282 at nsAppShell.mm:374
Flags: needinfo?(ttaubert)
(In reply to Tim Taubert [:ttaubert] from comment #99)
> The onSecurityChange() notification that causes the mixed content warning
> has the following backtrace, not sure if that helps:
> 
>     frame #12: 0x00000001011e64ad
> XUL`mozilla::net::nsHttpChannel::ProcessNormal(this=<unavailable>) + 285 at
> nsHttpChannel.cpp:1451
This seems to be the last function in the backtrace that appears in the http log.

Looking at the log in attachment 8479666 [details]:

[this=7f5e3eb81800] = nsHttpChannel https://www.mozilla.org/en-US/firefox/32.0/firstrun/
[this=7f5e2aab0000] = nsHttpChannel https://ssl-dv.mozqa.com/
[this=7f5e2aac0000] = nsHttpChannel https://www.mozilla.org/en-US/firefox/32.0/firstrun/ no-cache

Note, there are 2 http channels created for https://www.mozilla.org/en-US/firefox/32.0/firstrun/ where the first one in the log is the tab load and the second one should be the thumbnail as the http request includes "Pragma: no-cache", "Cache-Control: no-cache" headers which seems to match up with the thumbnailer's defaultFlags: http://mxr.mozilla.org/mozilla-central/source/toolkit/components/thumbnails/content/backgroundPageThumbsContent.js#31

07:15:48.298954 Creating nsHttpChannel [ssl-dv.mozqa.com]
07:15:48.304713 nsHttpChannel::SetupTransaction [ssl-dv.mozqa.com]
07:15:48.415729 Creating nsHttpChannel [mozilla.org]
07:15:48.415966 nsHttpChannel::SetupTransaction [mozilla.org]
07:15:49.334691 nsHttpChannel::OnStartRequest [mozilla.org]
07:15:49.334947 nsHttpChannel::ProcessNormal [mozilla.org]
07:15:49.335525 nsHttpChannel::OnStopRequest [mozilla.org]
07:15:51.154475 nsHttpChannel::OnStartRequest [ssl-dv.mozqa.com]
07:15:51.154587 nsHttpChannel::ProcessNormal [ssl-dv.mozqa.com]
07:15:51.176610 nsHttpChannel::OnStopRequest [ssl-dv.mozqa.com]

So it seems that the page request for ssl-dv starts before the thumbnail request and finishes after the thumbnail is done.
Seems caused by the thumbnail request only. The background image request that is, not actually the background capturing.
Simple STR:

1) Open a new tab (with about:newtab of course)
2) Paste and execute in the web console:
   window.location = "https://ssl-dv.mozqa.com/"; gGrid.sites[0].refreshThumbnail();

I guess that the top left thumbnail should actually have a thumbnail.
I have absolutely no clue about all this but my wild guess is that although we already started loading a new page the background image request counts towards the new page's resources. And because that isn't a TLS request we see a mixed content warning.
There's various calls to ProcessNormal in between the times of creating the channel for ssl-dv and when it finishes.

07:15:48.298954 Creating nsHttpChannel [ssl-dv.mozqa.com]
07:15:51.176610 nsHttpChannel::OnStopRequest [ssl-dv.mozqa.com]

07:15:48.571066 ProcessNormal https://snippets.mozilla.com/4/Firefox/32.0/201408..
07:15:49.334947 ProcessNormal https://www.mozilla.org/en-US/firefox/32.0/firstrun/
07:15:49.375001 ProcessNormal https://mozorg.cdn.mozilla.net/media/css/tabzilla-..
07:15:49.376808 ProcessNormal https://mozorg.cdn.mozilla.net/media/css/firefox_t..
07:15:49.381635 ProcessNormal https://mozorg.cdn.mozilla.net/media/js/firefox-mi..
07:15:49.387524 ProcessNormal https://mozorg.cdn.mozilla.net/media/js/site-min.j..
07:15:49.391434 ProcessNormal https://mozorg.cdn.mozilla.net/en-US/tabzilla/tabz..
07:15:49.410010 ProcessNormal https://mozorg.cdn.mozilla.net/media/js/firefox_to..
07:15:49.522339 ProcessNormal https://cdn.optimizely.com/js/246059135.js
07:15:49.664033 ProcessNormal https://mozorg.cdn.mozilla.net/media/img/firefox/a..
07:15:49.716912 ProcessNormal https://ssl.google-analytics.com/__utm.gif?utmwv=5..
07:15:49.723879 ProcessNormal https://mozorg.cdn.mozilla.net/media/img/sandstone..
07:15:49.725528 ProcessNormal https://mozorg.cdn.mozilla.net/media/img/tabzilla/..
07:15:49.734525 ProcessNormal https://mozorg.cdn.mozilla.net/media/img/sandstone..
07:15:49.735256 ProcessNormal https://mozorg.cdn.mozilla.net/media/fonts/OpenSan..
07:15:49.746571 ProcessNormal https://mozorg.cdn.mozilla.net/media/fonts/OpenSan..
07:15:49.775482 ProcessNormal https://mozorg.cdn.mozilla.net/media/img/firefox/a..
07:15:49.776380 ProcessNormal https://mozorg.cdn.mozilla.net/media/img/firefox/a..
07:15:49.776854 ProcessNormal https://safebrowsing.google.com/safebrowsing/downl..
07:15:49.784801 ProcessNormal https://mozorg.cdn.mozilla.net/media/img/firefox/a..
07:15:49.795099 ProcessNormal https://ssl.google-analytics.com/__utm.gif?utmwv=5..
07:15:49.836451 ProcessNormal https://ssl.google-analytics.com/__utm.gif?utmwv=5..
07:15:49.842452 ProcessNormal https://ssl.google-analytics.com/__utm.gif?utmwv=5..
07:15:50.334766 ProcessNormal http://clients1.google.com/ocsp
07:15:50.388844 ProcessNormal https://safebrowsing.google.com/safebrowsing/downl..
07:15:50.473384 ProcessNormal https://safebrowsing-cache.google.com/safebrowsing..
07:15:50.548620 ProcessNormal https://safebrowsing-cache.google.com/safebrowsing..
07:15:50.630643 ProcessNormal https://safebrowsing-cache.google.com/safebrowsing..
07:15:50.705378 ProcessNormal https://safebrowsing-cache.google.com/safebrowsing..
07:15:50.790982 ProcessNormal https://safebrowsing-cache.google.com/safebrowsing..
07:15:50.840616 ProcessNormal https://safebrowsing-cache.google.com/safebrowsing..
07:15:50.908970 ProcessNormal https://safebrowsing-cache.google.com/safebrowsing..
07:15:51.021427 ProcessNormal https://safebrowsing-cache.google.com/safebrowsing..
07:15:51.152420 ProcessNormal https://safebrowsing-cache.google.com/safebrowsing..
07:15:51.154587 ProcessNormal https://ssl-dv.mozqa.com/

If we're only looking for the non-https, then the only one is 07:15:50.334766 ProcessNormal http://clients1.google.com/ocsp
Kai, it looks like we're getting erroneous mixed-content warnings here due to a background-image request that is fired shortly after navigating away from about:newtab (see comment #102). Looking at nsSecureBrowserUIImpl.cpp it seems like you fixed something similar in bug 165301...

Shouldn't we ignore the bg-img request in OnStateChange()? Due to the UpdateSubrequestMembers() call we increase mSubRequestsNoSecurity and thus end up in lis_mixed_security.

Any hints appreciated!
Flags: needinfo?(kaie)
A workaround only for about:newtab would be to set URI_IS_LOCAL_RESOURCE for the PageThumbsProtocol (moz-page-thumb://). That would ignore the resource load as it doesn't hit the network anyway. I wasn't able to reproduce the issue with pages other than about:newtab.
This looks like a duplicate of bug https://bugzilla.mozilla.org/show_bug.cgi?id=947079 and a problem with nsSecureBrowserUIImpl (also see https://bugzilla.mozilla.org/show_bug.cgi?id=832834).

If the moz-page-thumb:// protocol never makes network requests (the thumbnails are always retrieved locally) you can set them as URI_IS_LOCAL_RESOURCE, as Tim has suggested - http://mxr.mozilla.org/mozilla-central/source/netwerk/base/public/nsIProtocolHandler.idl#213.
Depends on: 947079
Ed, Gavin, can any of you please review? Drew is on PTO currently.
Assignee: nobody → ttaubert
Attachment #8480454 - Flags: review?(gavin.sharp)
Attachment #8480454 - Flags: review?(edilee)
Flags: needinfo?(kaie)
Component: Mozmill Tests → General
Product: Mozilla QA → Firefox
QA Contact: hskupin
Whiteboard: [mozmill-test-failure] → [mozmill]
Comment on attachment 8480454 [details] [diff] [review]
0001-Bug-1046645-Mark-moz-page-thumb-as-local-resources-t.patch

I'm not a peer, but f+ for passing the mochitest I wrote in the process of debugging this issue.
Attachment #8480454 - Flags: review?(edilee) → feedback+
Posted patch v1 testSplinter Review
Comment on attachment 8480454 [details] [diff] [review]
0001-Bug-1046645-Mark-moz-page-thumb-as-local-resources-t.patch

Seems like we should probably add URI_IS_LOCAL_FILE too (AFAICT we are garanteed to always point to an underlying file:// channel).
Attachment #8480454 - Flags: review?(gavin.sharp) → review+
(In reply to :Gavin Sharp [email: gavin@gavinsharp.com] from comment #111)
> Seems like we should probably add URI_IS_LOCAL_FILE too (AFAICT we are
> garanteed to always point to an underlying file:// channel).
I was looking into the other flags:
http://mxr.mozilla.org/mozilla-central/source/netwerk/base/public/nsIProtocolHandler.idl#146

Of the 5 "MUST SET ONE OF THE FOLLOWING FIVE FLAGS" flags:
- URI_LOADABLE_BY_ANYONE
- URI_DANGEROUS_TO_LOAD
- URI_IS_UI_RESOURCE
- URI_IS_LOCAL_FILE
- URI_LOADABLE_BY_SUBSUMERS

We already have URI_DANGEROUS_TO_LOAD, so I wasn't sure if adding more of these "5 flags" is possible.
NS_IMETHODIMP
nsAndroidProtocolHandler::GetProtocolFlags(uint32_t *result)
{
    *result = URI_STD | URI_IS_UI_RESOURCE | URI_IS_LOCAL_RESOURCE | URI_NORELATIVE | URI_DANGEROUS_TO_LOAD;
    return NS_OK;
}

nsAndroidProtocolHandler sets two of those as well so I assume that this is possible? I couldn't find any other protocol handlers doing that though.
Adding more than one should be fine if they all apply, the restriction is that you must include at least one.
By you adding :bz to the CC list I read that as a comment by him :/ If he tells us otherwise we can still back it out...

https://hg.mozilla.org/integration/fx-team/rev/e74b08895ef7
These flags are mean to be mutually exclusive.  I thought we had asserts to that effect, but apparently not.  :(  We should update the IDL comment to make this clearer and add such asserts....

Gavin, URI_IS_LOCAL_FILE is a claim about the origins of pages loaded from the URI, not whether it has a file:// channel under the hood.
(In reply to Boris Zbarsky [:bz] from comment #116)
> These flags are mean to be mutually exclusive.  I thought we had asserts to
> that effect, but apparently not.  :(  We should update the IDL comment to
> make this clearer and add such asserts....

Should file a follow-up to fix the Android protocol handler.

> Gavin, URI_IS_LOCAL_FILE is a claim about the origins of pages loaded from
> the URI, not whether it has a file:// channel under the hood.

So I'll just remove that again.
(In reply to Tim Taubert [:ttaubert] from comment #117)
> (In reply to Boris Zbarsky [:bz] from comment #116)
> > These flags are mean to be mutually exclusive.  I thought we had asserts to
> > that effect, but apparently not.  :(  We should update the IDL comment to
> > make this clearer and add such asserts....
> 
> Should file a follow-up to fix the Android protocol handler.

Filed bug 1061786.
(In reply to Tim Taubert [:ttaubert] from comment #117)
> (In reply to Boris Zbarsky [:bz] from comment #116)
> > Gavin, URI_IS_LOCAL_FILE is a claim about the origins of pages loaded from
> > the URI, not whether it has a file:// channel under the hood.
> 
> So I'll just remove that again.

Sigh, my head seems to be a little confused. I'll back out both of the changesets as soon as the tree re-opens. As Ed states in comment #112 we already have URI_DANGEROUS_TO_LOAD.
(In reply to Tim Taubert [:ttaubert] from comment #120)
> 
> Sigh, my head seems to be a little confused. I'll back out both of the
> changesets as soon as the tree re-opens. As Ed states in comment #112 we
> already have URI_DANGEROUS_TO_LOAD.

Of the 5 mutually exclusive flags, I'm not sure which of URI_IS_LOCAL_FILE or URI_DANGEROUS_TO_LOAD is the right one.  However, URI_IS_LOCAL_RESOURCE is not one of the 5, so you can keep that flag which fixes the mixed content icon issue in this bug.
> Of the 5 mutually exclusive flags, I'm not sure which of URI_IS_LOCAL_FILE or
> URI_DANGEROUS_TO_LOAD is the right one.

It's simple.  Should a random file:// page be able to link to it?  If not, URI_IS_LOCAL_FILE is probably wrong.

URI_IS_LOCAL_RESOURCE is supposed to communicate something about where things are loaded from, not something about who can load what, so yes, it's fine to use here.
(In reply to Tanvi Vyas [:tanvi] from comment #121)
> (In reply to Tim Taubert [:ttaubert] from comment #120)
> > 
> > Sigh, my head seems to be a little confused. I'll back out both of the
> > changesets as soon as the tree re-opens. As Ed states in comment #112 we
> > already have URI_DANGEROUS_TO_LOAD.
> 
> Of the 5 mutually exclusive flags, I'm not sure which of URI_IS_LOCAL_FILE
> or URI_DANGEROUS_TO_LOAD is the right one.  However, URI_IS_LOCAL_RESOURCE
> is not one of the 5, so you can keep that flag which fixes the mixed content
> icon issue in this bug.

Right, maybe I shouldn't push stuff today. Thanks for checking that again.
Comment on attachment 8480676 [details] [diff] [review]
v1 test

This test fails without your patch on all except osx10.8 opt (??):
https://tbpl.mozilla.org/?tree=Try&rev=7645b2dcd5d4

All green with the patch:
https://tbpl.mozilla.org/?tree=Try&rev=46b121ff4d56
Attachment #8480676 - Flags: review?(ttaubert)
(In reply to Tim Taubert [:ttaubert] from comment #105)
> Kai, it looks like we're getting erroneous mixed-content warnings here due
> to a background-image request that is fired shortly after navigating away
> from about:newtab (see comment #102). Looking at nsSecureBrowserUIImpl.cpp
> it seems like you fixed something similar in bug 165301...
> 
> Shouldn't we ignore the bg-img request in OnStateChange()? Due to the
> UpdateSubrequestMembers() call we increase mSubRequestsNoSecurity and thus
> end up in lis_mixed_security.
> 
> Any hints appreciated!

This was a long time ago, and I don't remember the details. I would have to reread the code to remember. I think Honza had worked on this code more recently, and I hope he has a fresher memory, adding him to cc.
Comment on attachment 8480676 [details] [diff] [review]
v1 test

Review of attachment 8480676 [details] [diff] [review]:
-----------------------------------------------------------------

The test might trigger the failure but it looks a little too complicated to me. We should do something like in comment #102: load about:newtab, navigate away and refresh the first thumbnail.
Attachment #8480676 - Flags: review?(ttaubert)
Comment on attachment 8480454 [details] [diff] [review]
0001-Bug-1046645-Mark-moz-page-thumb-as-local-resources-t.patch

Approval Request Comment
[Feature/regressing bug #]: Not sure...
[User impact if declined]: False mixed content warning upon navigating away from about:newtab.
[Describe test coverage new/current, TBPL]: Working on a test, manual testing seems fine.
[Risks and why]: Very simple one-line fix that shouldn't affect any behavior other than that it prevents a mixed content warning due to moz-page-thumb:// requests being flagged as a local resource now.
[String/UUID change made/needed]: None.
Attachment #8480454 - Flags: approval-mozilla-release?
Attachment #8480454 - Flags: approval-mozilla-beta?
Attachment #8480454 - Flags: approval-mozilla-aurora?
Comment on attachment 8480454 [details] [diff] [review]
0001-Bug-1046645-Mark-moz-page-thumb-as-local-resources-t.patch

This fix has been demonstrated good. Patch is approved for Beta and Aurora.

ttaubert/mardak - I would still like to see a test for this case. Please ensure that the test is written, preferably soon.
Attachment #8480454 - Flags: approval-mozilla-beta?
Attachment #8480454 - Flags: approval-mozilla-beta+
Attachment #8480454 - Flags: approval-mozilla-aurora?
Attachment #8480454 - Flags: approval-mozilla-aurora+
(In reply to Lawrence Mandel [:lmandel] from comment #129)
> ttaubert/mardak - I would still like to see a test for this case. Please
> ensure that the test is written, preferably soon.

If it turns out to take longer than expected we could create a simple test via Mozmill. That was how we found this bug.
(In reply to Tim Taubert [:ttaubert] from comment #126)
> The test might trigger the failure but it looks a little too complicated to
> me. We should do something like in comment #102: load about:newtab, navigate
> away and refresh the first thumbnail.
That's exactly what the test does. It just has some test sanity checks to prep things:

prep1) make sure the thumbnailing is enabled (default off in some tests)
prep2) make sure the thumbnail doesn't exist from some previous test
business1) load about:newtab
business2) navigate away
business3) refresh first thumbnail
business4) check that identity box is showing the right content
(In reply to Ed Lee :Mardak from comment #131)
> (In reply to Tim Taubert [:ttaubert] from comment #126)
> > The test might trigger the failure but it looks a little too complicated to
> > me. We should do something like in comment #102: load about:newtab, navigate
> > away and refresh the first thumbnail.
> That's exactly what the test does. It just has some test sanity checks to
> prep things:

Yes, what I was saying is that we don't need some of this. AFAIU we don't need a thumbnail to exist and don't need thumbnailing enabled. By calling |gGrid.sites[0].refreshThumbnail()| we start a request that should cause the mixed content warning. It is not the background service causing the warning but the request for the background image. The code in the test is thus a little misguiding.
Moving over to Ed, he's working on the test.
Assignee: ttaubert → edilee
Flags: qe-verify-
Comment on attachment 8480454 [details] [diff] [review]
0001-Bug-1046645-Mark-moz-page-thumb-as-local-resources-t.patch

This bug will ship in Firefox 32.0.1.
Attachment #8480454 - Flags: approval-mozilla-release? → approval-mozilla-release+
https://hg.mozilla.org/releases/mozilla-release/rev/8320a78952e4
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 35
Verified as fixed using Firefox 32.0.1 (20140910203853) under Win 7 64-bit, Win 8 32-bit, Mac OSX 10.9.4 and Ubuntu 14.04 LTS 32-bit.
Verified as fixed using Firefox 32.0.1 build 2 (20140911151253), Firefox 33 beta 3 (20140911191954), latest Aurora 34.0a2 (20140911004004) and latest Nightly 35.0a1 (20140911064110) under Win 7 64-bit, Ubuntu 12.10 32-bit and Mac OSX 10.8.5.
This doesn't meet ESR landing criteria.  If it's needed for test-only, feel free to re-nominate.
You need to log in before you can comment on or make changes to this bug.